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ABSTRACT 


This thesis summarizes the theory of number theoretic 
transforms (NTT's), and presents original examples to 
illustrate the theory. Concepts have been studied and 
compared in order to present them in a cohesive and unified 
manner. 

Software and hardware implementation of Fermat number 
transforms are discussed and compared with the Fourier 
Transform showing a substantial improvement in efficiency 
and accuracy. The main drawback of Fermat Number Transforms 
is a rigid relationship between the allowed sequence length 
and word length. Methods and other NTT's, for overcoming 
this problem are discussed. The theory has also been 


extended to two dimensions. 
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I. INTRODUCTION 


With the rapid advances in large scale integration, a 
growing number of complex digital signal processing appli- 
cations are becoming economically feasible. In most cases 
the bulk of processing workload appears to consist of 
digital filter computation. Future progress in digital 
Signal processing, either towards high speed, real time 
operation or increased sophistication, thus aa depends 
on increased efficiency in digital filtering computation. 
This can be achieved not only by implementing improved 
filter Par ediite but also by using better computational 
algorithms as will be discussed in this thesis. 

Schonhaje and Strussen [1] (also see text by Knuth 
mm. 269)) defined Fourier-like transforms over the ring 
of integers, modulo the Fermat numbers [2] 2 +21, to 
yield convolutions. They showed that such convolutions 
can be used to perform fast integer multiplications. Rader 
13] and Agarwal and Burrus [4] also defined Fourier-like 
transforms over residue classes of integers, modulo the 
Fermat and Mersenne primes, to compute convolutions of the 
real integer sequences. 

For this presentation and exposition of Number Theoretic 
Transforms (NTT), the works of many different authors were 
consulted. Similar concepts have been studied and compared 


in order to present them in a cohesive and unified manner. 





In section II a mathematical framework for these new 
transforms is presented. This framework is based on the 
theory of congruences, modulo M, which belongs to the general 
area of what is often called "number theory." Number theory 
is very old, going back several thousand years; at least as 
far back as Euclid, who proved some of the original results. 
The names of many other famous mathematicians are also asso- 
ciated with the theory, including Joseph Lagrange (1763- 
1813), and Leonard Euler (1707-1783), and Carl Gauss (1777- 
1855) who is responsible for many contributions to the area, 
some of which were published in his book Disquisitiones 
Arithmeticae, published in 1801, when he was 24 [5]. An 
excellent history of the theory of numbers is found in the 
work of Dickson [6]. 

In recent years, there has been increasing interest in 
the practical applications of various parts of number theory, 
including the theory of reSidue number systems. There has 
been some work on the use of these number systems in general- 
purpose computers [7]-[12], although this line of investiga- 
tion has not yielded many practical results due to the 
difficulty of determining the sign of numbers expressed in 
residue number system notation. More promising results have 
been obtained in applications where sign detection is not 
required, such as number theoretic transforms [3], [4] 
and [13]. 

The possibility of performing the fast Fourier transform 


(FFT) in finite algebraic systems [14], [15] and [1], is 
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being increasingly discussed as a means of digital filtering 
[3], [13] and [16] (see also references in Reference [13]). 
The convolution property which such transforms share with 
the conventional £.f£.t. can be employed to construct a non- 
recursive digital filter in which rounding error does not 
occur. 

The following type of transforms have been discussed in 
the literature. 

(a) Transforms in arithmetic mod p, p prime, in which 

the order (number of points), d, for example, is 
a factor of p-l. 
(b) Transforms in arithmetic mod m, m arbitrary, in 
which d dividies p-1 for each prime factor p of 
m. 
(c) Transforms in an arbitrary finite Galois field 
GF (p”) of p” elements, where d divides pi. 
Here class (a) is treated as a special case of both (b) and 
(c), which are essentially different. This material is 
presented in section III which discusses the Fourier trans- 
form in a finite field in a broad way. 

The best known number-theoretic transform is the Fermat- 
number transform [1], [3], [4], [13], [16]. Due to the 
Simplicity of its arithmetic, such a transform is the fastest 
method known so far for computing integer convolutions 
under certain conditions. However, this transform suffers 


from the disadvantage that the restriction imposed on the 


jet 








register word length is often too excessive [13]. In 

order to remedy this problem, Rader [3] has suggested using 

a two-dimensional convolution scheme to convolve long one- 
dimensional sequences, and Agarwal and Burrus [17], [18] have 
presented such a two-dimensional convolution scheme. Other 
authors [19], [20], [21], [22], [23] have also considered 
other number-theoretic transforms (NTT). Section IV dis- 
cusses the various types of NTT. 

These transforms provide more choices of word length 
and transform length, at speeds which are not attainable by 
the Fermat-number transforms under similar conditions. 
Problems involved in the implementation of Fermat-number 
transforms are discussed in Section V. 

Section VI deals with the comparison between the FFT and 
the FNT. Software and hardware requirements for both are 
analyzed and compared. Agarwal [13] programmed Fermat 
number transforms on the IBM 370/155 computer [13] and showed 
how to compute convolutions approximately three times as 
fast as the FFT implementation for the same convolution. 
However, their main drawback is a rigid relationship between 
word length and obtainable transform length, as well as a 
limited choice of possible word lengths. This last point 
is especially significant for FNT's, and may result in a 
waste of computing power when the permissible word lengths 
do not correspond to the dynamic range required for the 


convolution. 
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In ect pie, Number Theoretic Transforms (NTT) could 
be implemented in the same was as Discrete Fourier Trans- 
forms with multiplications by trigonometric functions 
replaced by multiplications by powers of two, all operations 
being performed modulo a Mersenne or Fermat Number. When 
the transforms have a composite number of terms, as is the 
case with Fermat Number Transforms (FNT) or some pseudo- 
Mersenne Transforms [24], various pipeline computing tech- 
niques can be used [25]. 

In practice, however, direct transposition of Fast 
Fourier Transforms (FFT) architectures does not necessarily 
lead to optimum implementations and the development of 
Special configurations for computing NTT seems worth exploring. 
Along these lines, McClellan [26] has proposeda new coding 
technique for simplifying the implementation of Fermat 
Number Transforms. 

McClellan implemented a FNT convolver for rađar signal 
processing and reached some interesting conclusions. 
Leibowitz [27] presents a code translation which is mathe- 
matically simpler, and this proposed arithmetic provides 
simpler realizations of all operations required to compute 
the FNT. 

Nussbaumer [28] discusses the implementation of pseudo- 
Mersenne and Fermat Number Transforms. He shows that some 
pseudo-Mersenne Transforms can be computed efficiently by a 


linear filtering approach. This approach is extended to 
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cover the case of Fermat and pseudo-Fermat Number Transforms 
by using a special coding scheme for implementing arithmetic 
operations in a Fermat Number system. A number of sugges- 
tions have arisen for lengthening the sequences which can 
be handled by the NTT. One suggestion is to perform the 
calculation modulo several mutually prime modulo, and then 
obtain the desired result by using the Chinese Remainder 
theorem [13], [29]. Reed and Truong [30] have also shown 
how one can extend the method to Galois fields over complex 
integers modulo Mersenne primes to enable one to use the 
FFT algorithm to compute convolutions of complex sequences, 
and to lengthen the sequences which the method can handle. 
However, because this method requires several multiplications, 
it does not seem very promising. 
One of the most promising methods for lengthening the 
sequence one can E has been suggested by Rader [3] 
and developed by Agarwal and Burrus [31]. This consisted of 
mapping the one-dimensional Sequences into multidimensional 
sequences and expressing the convolution as a multidimen- 
Sional convolution. In Appendix A an explanation of the 
process of two dimensional convolution for convolving long 
= sequences is presented. By the use of an example it is 
shown that this process improves the length of the sequences 
handled by the NTT. 
With knowledge of the advantages and disadvantages of 
Fermat Number Transforms, it is possible to speculate on 


just what type of problems are likely to’ benefit from this 
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new approach. In general one looks for problems which have. 
the following characteristics: 

1) Fairly short sequences (about fifty delayed products) 

2) A high accuracy requirement 

3) Implementations where multiplications are very much 

more costly than additions. 

Two specific situations come to mind. The first is the 
estimation of magnitude spectra of (many simultaneous) 
wideliana Signals. The theory of power spectra estimation 
states that power density computations involves formulating 
a correlation function with a finite number of delays 
which are a fraction of the number of data points available. 

The second situation is two dimensional finite impulse 
response filtering. Here we may consider an arbitrary LxL 
impulse response to be applied to a large image. If Lis 
in the range of 52 points, the FFT is not particularly attrac- 
tive for convolution, although more so for two dimensional 
convolution than for one dimensional convolution. 

The Fermat Number Transform is quite effective, however. 
For impulse responses of this size, the number of multipli- 
cations is reduced by about two orders of magnitude over the 
direct method, in exchange for a number of additions which 
are not too different (usually less) than required for the 
direct method. Thus it can be expected that the Fermat 
number transform will soon play a part in the computation 


required for the filtering of pictures. 
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Recently, Derome [32] discusses a class of NTT's based 
on three bit primes, having many of the computational 
advantages of the FNT's. These NTT's can have much larger 
transform lengths than those for FNT's so that the fast 
convolution of, for example, a 1000x1000 point picture 
with a 24x24 point spread function should be possible in 
a minicomputer! This development was undertaken in connec- 


tion with analysis of high resolution electron micrographs. 
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II. MODULAR ARITHMETIC 


Let a and b be integers. We say that "a divides b" 
(denoted by "a|lb") and "b is multiple of a" if there exists 
an integer c such that b = ac. 

If a does not divide b, we denote the fact by "a/b". 
An important theorem concerning the division of integers 


(the Division Algorithm), is: 


Let a and b be integers, Benobszero,. Then there exist 


two unique integers, q and r, such that 


< b (2.1) 


a = bqaq+tr and 0 < F 


The integers q and r, are called the "quotient" and the 


"remainder" respectively. 


If a, b and M, are integers, with M > 0, such that 


M| (a - b) (2.2) 


we say that "a is congruent to b, modulus M", and we denote 


this fact, by writing 


(2:3) 


In other words, two integers a and b are congruent mod M, 


if M divides their difference. 
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We will refer to M as the "modulus". Note that 


1) 23 = 8 (mod 5) since 23 - 8 = 15 = 5-3 


and 


2) 23 


N 
NO 
O 
l 
uw 
A 


3 (mod 5) since 23 - 3 


are both true statements. 
We restrict our attention to what is called a complete 


residue system, mod M, as the set of integers 
AO Lp MS LE (2.4) 


au 


In other words, when an integer a is divided by another M 


e., 


where the remainder b, is some positive integer less than 


M, there exists a congruence 


a = b (mod M) (2.5) 


such that b is a unique integer among the numbers 
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Thus, one possible mechanization of residue reduction, 
is to divide by the modulus and keep only the remainder. 


Examples: 


PZA = 2 (mod 9) since E - 5 remainder = 2 
2) 81 = 0 (mod 27) since + = 3 remainder = 0 


Botn 2 and 0 are contained only once within the set of 
integers {0,1,...,8} and {0,1,2, ..., 26} respectively. 
The last example shows that, in general instead of saying 


that a number a is divisible by the number M we can write 


0 (mod M) 


W 
i 


~ 


For this means a O = a = Mk, where k is some integer. 


For instance, instead of saying that a is an even 


number, we can write 


a = 0 (mod 2) 


In the same manner one sees that an odd number satisfies 


a = 1 (mod 2). 


In working with residue reduction we will drop the 


symbol = for congruence and will use the symbol =. 
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But congruence is not the same as equality unless one 


can show separately that the difference between a and b is 


also less than M. 


The following basic arithmetic operations 


are permissible with modular arithmetic: 


a) 


b) 


€) 


d) 


e) 


f) 


Addition: 
Negation: 


Subtraction: 


Multiplication: 


Multiplicative 
inverse: 


Division: 


2+6 =8 = ] (mod 7) 
= -—=z77r 77 75 (mod 7) 


Spee ott Ces ica SSD + 7) 


= 5 (mod 7) 


3x6 = 18 = 4 (mod 7) 


Multiplicative inverse of an integer 


b in Z, exists if and only if b 


M 
and M are relative primes. In 
that case bxb | = 1 (mod M) 
27l = 4 (mod 7); 2x4 = 8 = 1 (mod 7). 


a/b exists if and only if b has 


an inverse. In that case 


a/b = el: 4/2 = 4x4 = 16 = 2 (mod 7). 


Note that because of the nature of modular arithmetic, 


numbers do not have sizeS or magnitude. One cannot say 


that a particular number is larger than another or that 


numbers are close. 


Extracting the residue is a functional transformation 


of a into b. 


It occurs often enough in what follows that 


it deserves a special symbol, <-> 
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<a> = b (2.6) 


The subscript M may be omitted if it is understood from 
the context. 

Computations involving residues are usually simple 
because it is never necessary to work with quantities 
larger than the modulus unless one finds it convenient. 


Notice that 
<x+y> is the same as <<x> + <y>> 
<x-y> is the same as <<x> - <y>> (2.7) 
<xy> is the same as <<x> <y>> 
so that in any computation involving only +, -, x one may, at 


x 
their option, replace the result of any step by its residue. 


Example: 


<15+13>, = <<15>, + <13> >, = <1+6>~ = 0 
<1l2-11>~ = <<12>., = <l1>.> = <5 = 4>. = 1 
<9 x 14>, = <<9>. x <14> > = <2 x 0>- = 0 


Note that if it were necessary to divide for residue 


reduction the operation would be quite costly. 
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Fortunately there are simpler techniques. In the 
simplest case - the residue or a decimal number modulo 10 


is its last decimal digit, since 


= i z I 
<a>49 =< )} a; 10°>,, <) <a, 107>> (2.8) 
1 
and <a, lo is zero except for i = 0 term. 
Example: 
3 
£ i 
023 rar 10, 
1=0 


<1x10° # 0x102 + 9x10 + 8x 10°> 


10 


<1098>,, = 8 (mod 10) 


This can be generalized to any radix. If Mis a power of 
two, and a is represented on a binary machine, one has a 


trivial method of extracting <a> y 


(2:9) 


This operation is performed by "masking out" all but the 


K least significant bits. 
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Example: 


2 2 
1 1 
ur: =e) a; <2->> = ) a, 2 
O 1=0 
m 0 1 2 
= <llll> 3 7 lx2 + 1x27 + 1x2 = 7 mod 8 
2 


i.e., here K = 3, and only the 3 least significant bits 
account for the value of the residue. The fourth bit (the 


most significant bit in this case) is "masked out." 


A. SOME IMPORTANT RESULTS IN MODULAR ARITHMETIC 

Euler's function is defined as Q(M), the number of 
integers in the finite set {0,1,2, ... M-1} (which is called 
the set of integers mod M, and denoted by 2 y) that are 
featively prime to M. 


For M a prime, 
$M) = M- 1 (TON 


Example: let M= 7. The ring of integers mod 7 is 


Zo = {0,1,2,3,4,5,6}. Since M = 7, is a prime, the 


integers in Z{ that are relatively prime to M = 7, are all 


7 


elements of the set (except zero), i.e. 


Il 
= 

i 
> 


(Di) 


ll 
~J 
i 
HE 
ll 
OV 


$ (7) 
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If M is composite and its prime factored form is 
denoted by 
2 E 


M = Py Po 72+ Po (2al L) 


then the general expression for q(M) is, [13] 


O o E) 012) 
Example: Let M = 12 = 22323 
Ze ao oe, 


12 


by simple counting one finds 4 numbers that satisfy the 
conditions of being relatively prime to M. Checking by 
applying equation (2.12) 


e a4 - 1) = 1) (2) = 
912) = 1241-5) (1-5 = 12(65)()) 4. 


An important theorem known as Euler's theorem states 


that for every a relative prime to M 
MD = 1 (mod M) Das) 
For M prime, this reduces to Fermat's theorem 


a = 1 (mod M) 12234) 
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which holds for all nonzero elements of Z., since they 


are all relative prime to M, 


Example: Let M = 


Here: 


There are certain 


7 


M-I 
0 


= 1 (mod M) 


1 (mod 7) 


64 = 1 (mod 7) 


729 = 1 (mod 7) 


4096 = 1 (mod 7) 


15625 = 1 (mod 7) 


46656 = 1 (mod 7) 


M 


if M is prime by assumption. 


which holds tor all 
non-zero elements 
of Z,, Since they 
are all relative 
primes to M. 


roots of unity that are of particular 
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interest. If N is the least positive integer such that 


aN = 1 (mod M) (is) 


then a is said to be a root of unity of order M, or simply 
of order N. 

If the order of a is equal to 0(M), then a is called a 
primitive root. 

If M is prime and a is a primitive root the set of 


integers 
K 
{ao (mod M), K = 0,1,2, ... M-2) (2.16) 


is the total set of non-zero elements in Z Thus all 


>» @ 
E 


nonzero integers in Z,, can be generated by powers of a 


M 


primitive root. This characterizes the entire field. 
Example: 


Let M = 7 
Z =0,1,2,3,41,55,6} 


7 


Looking in a table of primitive roots, for example [13] 


we get 
3 and 5 are primitive roots of 7, 


thus 
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ea nod eae) 2... 5 = M-2} 


eee ee rs: | 


(Mies 2 64-5) mod 7 
and those are all non-zero elements in 22. The same for 


E A 


Eo Don) 


Ma Ao Z2 3). mod 7. 


Euler's theorem implies that if a is of order N, then N 


must divide q(M), denoted by 
N| $ (M) (PASITO) 


If Mis a prime it can be shown [13], that roots of 


order N exist if and only if 


N| (M-1) (2.18) 


and the roots are given by [13], 


us (2.19) 


Qs g 


where a, denotes a primite root. 


$ 


= = 





Example: Let M = 7 0(7)) = M- 1 = 6 


Possible order of roots: N = 1,2,3,6 
primitive root (from table): 3 


i Order of roots: 





eS a= 3 = 3° = 6 mod 7 
a = 6 order 2 
TS ES mode A o 
| a = 2 order 3 a = 3 order (6 = $(7)) 


primitive root 


More generally, if a is a root of order N then [13] 


| o 


añ is of order N/K, if K|N 


(2.20) 
us is of order N, if N and K are 


relatively primes 


Example: Let M = 11 p(M) = 10 


Now a = 2 is a primitive root since a = 1 mod 11 and 


N = 10 is the least positive number such that 


N od 1 
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Then 


a = = 4 mod 1 is a root of order N=10 


This can be seen as follows: 





i.e., the root a= 4 generates a cyclic subset of the field 


With N = 5 distinct elements (order 5). 


23 = 8 mod 11 since N = 10 and K = 3 are relatively 


For a" 
primes a = 8 mod 11 will be a root of order N = 10, ora 
primitive root. 


Checking: 





Notice that (2.20) implies that the number of roots of 


order N, is given by $(M), and therefore the number of 


primitive roots is o(d(M)) (since for a primitive root 
N = $(M)). 
Example: Let M = 7 Z, = 1071,2,3,4,526) 
Number of primitive roots = d(6(7)) = (6) = 6 (1-9 (1-3) 
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as seen previously, 3 and 5 are primitive root mod 7. 


Number of roots of order N = 3: o0(3) = 3 - 1 = 2. 
e 3- =] > mod 7 A 
a = 2 order 3 ' a= 4 order 3 


So a = 2 mod 7 and a = 4 mod 7 roots of order N = 3. 
These relations will allow one to calculate all of the 
roots of all possible orders from one primitive root. 
We can summarize these ideas more precisely in the 
following: 
- The highest possible exponent to which a number can 
belong mod M is Q(M) 
(i.e. the highest order of a root is N = Q(M)) 
- If a number has order N = Q(M), we shall call it 
a primitive root for the modulus M. 
- Not every modul has primitive roots; for instance, 
for the modulus M = 15, one finds that every x in 


Z relatively prime to 15 satisfy the congruence 


M’ 
X = | (möd 15) 


and yet d(15) = 8. 

- To find the primitive roots of a modul if they exist, 
one must usually proceed by trial and error, although 
there are certain rules that may facilitate the search. 

Often one of the small number 2, 3, 5 or 6 may turn 


out to be a primitive root. 





- Extensive tables of primitive roots for primes 
have been computed. The first of these, the "Canon 
Arithmeticus" (1839) by K.G.J. Jacobi, included 
primitive roots for all primes below 1,000. More 
recent tables by Kraitchick, Cunningham, and others 
give primitive roots for all primes up to 25,000 
and even beyond. 
One interesting point, not mentioned so far, is the 
existence of multiplicative inverse. 


Multiplicative inverse of an integer b in Z, exists if 


M 
and only if b and M are relatively primes. 
In that case, bt is one integer such that 
bxb + = 1 (mod M) (Brom) 
Example: Let M = 7 
s+ =? 5x5 *=1m47 57 = 3 
Since 5x3 = 15 = 1 mod 7, i.e., if M is a prime for 


every non-zero integer a, in Z there exists an inverse 


M’ 
[13] 


a (2.21la) 


this can be seen by another example. 
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Example: 


Note that 


3]: 


if a and M 
Example: 


$ (12) 


2v=12 


So there a 


relatively 


1x1? 


1 = 1" = 1 mod 7 = 1 mod 
2772 = 2? = 32 = 4 mod 7 2x27} = 
3772 = 3? = 243 = 5 mod 7 3x3l= 
al2 = 47-1024 =2md7 4x41 = 
Ten ae a moda 1 550. = 
ioe ences 6 mca 7 6x6. = 


2x4 


XD 


4x2 


5X3 


6x6 


for a non-prime M, a has an inverse given by 
¢(M) -1 
Q 
are relatively primes. 
N = 12 = 2°x3 
> 2 ee 1) (4) = 
= 1211 5) cal 3) = 12 (5) (3) = 4 


a o 0 19819710, 115 


re (12) 4 elements in Z 


122 
prime to M= 12. 
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8 = 1 mod 7 
15 = 1 mod 7 
8 = 1 mod 7 
15 = 1 mod 7 
36 = 1 mod 7 
(2.22) 


(1,5,7,11) that are 





Those a's have inverses given by: 


-1 _ „s(12)-1 _ „4-1_ ,3 


an = 1 1 L = 1 mod 12 
=| u = 
lxl =e MOC sae = 1 mod 12 
a, = 5 esto seen a 25> = 5 moa 12 
-] = 
Br mod 12 
ete 7 = aa 
-] u 7 
IX]? = 7x7 = 1 mod 12 
= 11 i eee med Ad 
ier TT ieee med 12 


This suggests that in this case oa = b. Therefore, we 
verify that by considering M a composite rather than a 
prime, one observes several differences. 


If we define Z,, as the ring of integers modulo M, 


M 
then if in a ring of integers multiplicative inverses exist 
for all non-zero integers, this ring is called a field. 

A field with a finite number of elements is also 
called a Galois field. It is clear then, that Zu is a 
field if and only if M is a prime. 

ZM is not a field for a non prime M, since not all 


elements will have multiplicative inverses. Also, there is 


no primitive root that will generate the entire ring, fonly 


Subsets with ¿9(M) elements, at most.) 
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Example: M= 8 = 2 


subset with 4 elements 
root of order N= 2 
subset with 3 elements 
root of order N = 2 


subset with 4 elements 


N 
N 
N 
N 
N 
N 
N 
N 





root of order 2 


In these conditions, the previous example shows, that 
there are multiplicative inverses, only for those elements 
in Zo relative primes to M = 8. 

We can investigate, a little more, the arithmetic 

- modulo M, when M is not a prime. Let M have the following 


unique prime power factorization (2.11) 


E Y 


2 N 
M = Py aE > O 


when the arithmetic is done mod M, it is in effect done 
r: 
modulo each prime power Pi > Simultaneously [13]. 


A set of arithmetic operations can be done either 
ee 
modulo each Pi + separately and the final result mod M 


obtained using the Chinese remainder theorem [13], or 
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alternatively all the operations may be done mod M, but 


Ie 
they must be valid operations mod for each Py = 


In these conditions, an integer a is said to be of 


order N in Z, if and only if it is of order N in each 


M 


: 
pi 


Here we present some basic results. 
a = b mod M 


is true if and only if 


Ean 
a = b mod p, 5 i = 1,2,...2 (PROS) 


© a 
If we know the residues of an integer a modulo each Pi = 


We can uniquely reconstruct the integer a mod M using the 
Chinese Remainder Theorem. 


To establish this theorem, let 


oe 
= i 
a = aj mod Pi (2.24) 
a 
d, = M/(p, ~) (2.25) 
and 
-1 A ri -1 = 
d = (d; mod Py ) mod Pi (2.26) 
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then 


P 
pe eL 
a = ()d,d a,) mod M 
ro 
Example: 
Let a = 123 and M 24 
Calculation of a;,S: 
$ = 3 
a = 123 = ay mod 2 a 
a = 123 = a, mod 3 a 
Calculation of d,,s: 
r 
E ee 3 UE 
dı = M/P, = 24/2 = 
2 
d, = M/P, Ah” = g 
za 
Calculation of d,'s: 
-1 A "1-1 - 
d, = (d, mod Py ) = (3 mod 8) 
=i A Y2 -1 Z 
d, = (d, mod Po ) = (2 mod 3) 


then 


36 


(2.27) 


3 mod 8 


0 mod 3 


3 mod 8 


= 2 mod 3 


1 _ „M02 3-2 


| 
N 

I 
N 


A o 





a= (3x3x3) + (2x2x0) = 27 = 3 mod 24 


Check: 


a = 123 = 3 mod 24. 
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III. TRANSFORMS IN FINITE FIELDS 


The basic operations of signal filtering is convolution. 
In the discrete time signal processing situation, convolution 


takes the form 


co 


y(n) = x(n) * h(n) = ) x(m) h(n-m) (3.1) 


n= = 0 


Where h(n) is the response to a unit impulse, for a causal 


filter, then 


h(n) = 0 for me 


and, when the duration of the impulse response is finite, 


the infinite sum (3.1) reduces to a finite sum 


N-1 
yin) = ) x(m) h(n-m) (3.2) 
m=0 
where N is the length of the finite impulse response 
METR) filter. 
The processing workload required to evaluate (3.2) 
can be reduced significantly if direct computation is 


replaced by transforms methods. This is so provided the 
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application in question allows sequences to be processed 
in blocks. E 


The Discrete-Fourier Transform (DFT) is one of the most 


versatile transforms, and is defined by 


ID 

l 
er 
i 
=) 
A 


DFT X(K) = })} x(n) e (3.3) 


and its inverse transform by 


IDFT & x(n) = = ) X(K) e N oc) 
K=0 
m = 03 Es eee N-1 


where N is the length of the sequence which transforms one 
wants to calculate. 

In order to use the DFT in the high-speed implementation 
of convolution, one makes use of its cyclic convolution 


property, which states that 


DFT[h(n) * x(n)] = DFT[h(n)] - DFT[x(n) ] (3.5) 


and this implies that convolution can be implemented using 
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y(n) = IDFT{DFT[h(n)]-DFT[x(n)]} (3.6) 


The convolution implemented by (3.6) is called cyclic convo- 
lution since it evaluates (3.2) as if h(n) and x(n) were 
periodically extended outside the range [0,N-1], or equiva- 
lently, the indices were evaluated modulo N. Notice that 
normal finite convolution can be calculated by cyclic 
convolution if zeros are apended to x(n) and h(n) to prevent 
folding or aliasing [34]. 

The DFT is a transform of finite sequences, of real or 
complex numbers. One might ask if there are transforms in 
other number fields. That is, given a sequence of numbers 
modulo M, is there a transform that has the cyclic convolu- 
tion property?: One will seé that the answer to this question 
is yes. 

If one has a sequence of numbers of length N, then a 


transform of the form given by the pair 


N-1 
X(K) = JY x(n) aD OT 
n=0 
N-1 
O a aaa ae (3.7a) 
K=0 


is said to have a DFT structure [34]. 
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If one looks for the properties that a general trans- 
form (3.1) having the DFT structure must have to exhibit 
the cyclic convolution property, one finds [4], [35], that 
it depends on the existence of an a that is a root of unity 


Si order N, i.e., 


Q = 1 (3.9) 
and that n+ = = exists, i.e., the inverse of N exists. 
It has been shown [35] that inthe complex number field the 
E 
N 


conventional DFT with a =e is the only transform, with 
that property. However, by working in a finite field or 
ring of integers with arithmetic carried out an integer 
M, a large class of transforms exist [14] that have the 
cyclic convolution property. By special choices of the 
length N, the modulo M, and the value a, it is possible to 
have transforms with many interesting properties. These 
are called number theoretic transforns. 

The possibilitissof interest are, [14]: 


1) the ring of integers Z with respect to modulo M 


M’ 
2) the field of integers GF(p), with respect to prime 
modulus p 
3) the Galois field GF(p*) of p* elements. 
het Zu represent the ring of integers {0,1 ... M-1) (2.4) 
with arithmetic carried out mod M. Let M have the following 


unique prime power factorization: 
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i 
M = Py Po 2 ... Po E (2.11) 
where the p,'s are distinct primes. 

As pointed out previously (Section II), when one carries 
arithmetic mod M, one is in effect doing it modulo each 
Pi 5 simultaneously. Therefore, the length N number theoretic 
transform, having the cyclic convolution property in Zur 


must also have that property in 


This requires that a (mod Pi =e, an integer of order N 


must exist in Zr; , i.e., N is the least positive integer 


Da 
such that 

N Ey 

Q = 1 (mod Pi pela eee (3.9) 
Example: Let M = 24 = 2E 
then 

Z, = 2, = Korean ana. 

2 
and 

Z Zee 
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For (3.9) to be satisfied, since 


my, for zZ 


3 





order 2 


and 


mi) for Z 





(3.10) 

, 1.e., a= 1 order N 
a = 2 order N 

(3 LL) 


i.e., a = 1 order N= 1 


Then, a = 1 (mod 24) is a root of order N = 1, thus the 


length of the number theoretic transform possible in the 


ring Z7g! is N= 1 (not a very interesting result!). 


Furthermore, Since the inverse transform requires the 


existence of the inverse of N, 


i.e., ne this number should 


exist in Zr, , or N should be relatively prime to M (2.21). 
a 


i 
In this example, 


mi) for Z 
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one wants N. = 1 = ? 


By (2.21a) 


N = 1 = 1 = 1 = 1 mod 3 


(1i) for Z 


8 
i ; =] _ =] _ 

also required is N = 1 E 
By (2.22) 

A ee nae a= 
but 

> 2 1 = 

$ (M=8) = 8(1 - 5) = 4 
then 

ieee eid A E a) 


Thus we have verified the existence of a number theoretic 
transform of length N = 1, in the ring of integers 254° 
In general, the existence of an of order N, each gi 
can be investigated recalling Euler's theorem (2.13). 


That is, by observing (3.4) and Euler's theorem, one has 


the condition 
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r . 


N| $ (P; 2 ML, (3.12) 
za 
ie., N should divide $ (P; E 
Example: Let M= 11x31 = 341 
Then 
211 = 201352 ... 7 10) 
and 
231 = BORN 2772030) 
2), for 21] 
ó(11) = 11-1 = 10 , implies possible values of 
order N for the roots in 
211 ch: 2,520) 
mi) for 231 
$(31) -= 31- 1 = 30 , implies possible values of 


order N for the roots in 
231 rana oo. 10-30) 

That is, for (3.12) to be satisfied, in the conditions given 
by i) and ii), the possible values for the length of the 

NTT in 2341 are (1,2,5,10). 
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Notice that by (3.9), whatever the root in consideration, 


it has to be the same order in 21] as that in 231" 


For 2711 one constructs the following table. 


N AZ BA 56 7 8 9 10 


root of order 1 


n n n 5 (3 : 12a) 





root of order 5 


For 2311 a similar table is constructed, from which only 
a part is shown (notice that only N = 1, 2, 5, 10 are of 


interest). 


root of order: 1 
root of order 5 
root of order 30 


root of order 5 


root of order 2 








Thus the root a = 4 is of order 5 in both 217 and 2210 


and so the length of the NTT in the ring Z can be equal 


34 


to 5. Furthermore, since the multiplicative inverse of an 


integer b in Z,, exists if and only if b and M are relatively 


al 
prime, and for the inverse transform one requires N `, N 


M 


should be relatively prime to M (or p; 'S). 


In the last example: 


N = 5 is relative prime to N = 243 so 51 mod 341 

exists and is given by 

a od il 
but 

EA aa a =; 300 

- 10 31 

so 

But mod 341 = sol moa 341 = 5%” mod 341 


To find this number notice that 


299 = 256 + 32 +8 +2 +1= 2° + 2? + 27 + 21 + 24 


| 
SO 
E 256 „32 „8,.2 
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But 


Then 


So 


5% = 25 mod 341 

5% = 284 mod 341 
5% = 180 mod 341 
516 - 5 mod 341 
EME Oo mod 321 
5°4 = 284 mod 341 
5128 _ 180 mod 341 
BE = 5 mod 341 


so mod 341 = (5 - 25 - 180 +» 25 »- 5) mod 341 
= 273 mod 341 
Check: 
(5x 273) mod 341 = 1] mod 341. 
Bo A 273 mod 341. 
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o o 
o 


That is, it is verified that a number theoretic transform 
of length N = 5 exists in 2341" Now, the condition N 


relative prime to M (or P; 'S) means 


N| (P; ~ 1), A A (3.13) 


N| gcd {p,-1, SA p,-1} : 


O(M) is defined as the greatest common divisor (gcd) 


of the (Pi ale 


O (M) o Ll . (3.14) 


Therefore 


N|O(M) | (3215) 


This last equation gives the necessary condition for the 


existence of a transform of length N in the ring Zn with 


arithmetic carried out an integer M. 


Example: Let M= 11x31 = 341 


O(M) = gcd{11-1,31-1} = gcd{10,30} 


10 
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Notive that N = 5 satisfies the condition (3.15) since 
N = 5|(0(M) = 10.) 


It remains to be investigated further whether or not a 


given a of order N = 10 exists in both 211] and Z31* 


Now, consider the converse of condition (3.15). If 
i: r: 
N|O (M) or N|o(p; zis then there exists integers OL (mod Pi =) 


Br oder N in Zy, (ES). 
i 


t: 
A 1 
Using these a's one can construct transform (mod Pi =) 


which have the DFT structure (3.7) and are invertible. 
Combining these transforms by the Chinese remainder 
theorem (2.27), one can obtain a transform (mod M) having 


the cyclic convolution property in Z Alternatively one 


M` 
can combine the a,'s by the Chinese remainder theorem to 


obtain an a (mod M) of order Nin Zu and construct the 


final transform using this a. The results will be identical. 


Example: Let M= 5x17 = 85 
o(5) = 5-1 = 4 . possible values of N: 1,2,4 
(17) = 17-1 = 16 possible values of N: 1,2,4,8,16 


If one looks for an a of order 4 in 2 by constructing 


85“ 


tables for Z. and 217 as those in (3.12 a,b), one finds 


5 
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e 
ll 


2mod 5 (order 4, in Ze) 


4 mod 17 (order 4, in 217) 


e 
ll 


Using the Chinese remainder theorem (2.27): 


(i) d,'s calculation 


di MISAS => Li m0da .5..= 2. med 5 
d, = (5x17)/17 = 5 mod 17 
-1 


(ii) Calculation of di aS 


d} CE COR = a ee (by 2.2la) 
/ 
T- e a oa ay (by a) 
Then (by (2.27) 
a = ((2x2x3) + (4x5x7)) mod 85 
a = (12 + 140) mod 85 
a = 152 mod 85 = 67 mod 85 
Check: 
677 = mod 85 
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Notice that a = 64 mod 5 is of order 4 in Z and also 


57 
Q 64 mod 17 is of order 4 in 217° 


To establish the existence of a NTT in Z 


57 with length 


8 
N = 4, one has to find the inverse of N = at, i.e., 
at = > 
By (2.22) 
De end aes) = 85 (1 ==) (= ne 
5 17 
So 
al = „93-1 = 4° mod 85. 
Since 
63 = 32 +16 +8 +4 +2 +] 
403 u 432 ; 416 ; 48 . 44 ; 42 , al 
But 
ae = 4 mod 85 
Zee 
4 = 16 mod 85 
4 
4 = 1 mod 85 
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4 = ] mod 85 
416 = ] mod 85 
432 = 1 mod 85 
SO 
od 6d mod 55 
Check: 
4x64 = 1 mod 85. 


The fact that condition (3.15) holds as well as its 
converse [13] means that (3.15) is the necessary and 
sufficient condition for the existence of an invertible 
transform of length N which has the cyclic convolution 
property, mod M. 

This also establishes that the maximum transform length 
in Z,, is 


M 


NS = O(M) ©8216) 
Notice that for a given modulus one knows exactly what are 
the possible transform lengths in Zy 
For any NTT to be computationally efficient, there are 


three main requirements [17]: 
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(1) N, the transform length, should be highly composite 
(preferably a power of 2) for an FFT-type algorithm 
to exist, and should be large enough for practical 
sequence lengths. 

(ii) The multiplication by powers of 4(3.7) must be a 
a Simple operation. This is possible if the powers 


of a have a binary representation with few bits. 


(iii) To simplify the arithmetic modulo M, M should have 
a binary representation with very few bits and 
be large enough to prevent overflow. 
Although the class of all possible number theoretic trans- 
forms seems very large at first consideration, closer 
examination shows that very few seem to satisfy the afore- 
mentioned criteria. The parameters that must be chosen are 
M, N and a. 
Briefly, one verifies that if M is even, it has a factor 
of 2 and, therefore, O(M) and N ax are 1, which implies 
M should be odd. If M is prime then O(M) = M-1 which is as 
ce as one could hope for in a field of M integers. 
In section IV, the various types of NTT are discussed, 
but one can say that conditions ((3.8),(3.16)) do not give 
a systematic way of determining the "best" choices. Asa 
result one must use intuition, insight and a bit of searching. 
Usually M is selected and the resulting possible N and a 


are then examined. 
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IV. NUMBER THEORETIC TRANSFORMS 


One computational need in the digital processing of 
Signals is the evaluation of the circular convolution 
summation of two sequences of length N. That is, the 
evaluation of 


N-1 
y(n) = } x(n-m) h(m) (4.1) 


nn, Ss Oe NL 


The so-called fast convolution procedure obtains this 
sum by taking the inverse transform: of the product of the 
transforms of the two sequences. If the transform used is 
the discrete Fourier transform, then error-free results 
are obtained only if infinite precision arithmetic is 
assumed. This is true, even if both sequences are composed 
of finite precision numbers because the Fourier transform 
involves the irrational number exp[-j(27/N)]. 

One way to avoid the round-off errors induced by the 
transform is to make use of the fast transforms over the 
finite field [14]. 

By working in a finite field or ring of integers with 
arithmetic carried out modulo an integer M, a large class 
of transforms exist that have the cyclic convolution property 


(3.5). By special choices of the length N, the mod M, and 
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the value of a, it is possible to have transforms that need 
only word shifts and additions but no multiplications, that 
have an FFT type fast algorithm, that do not require 
storage of complex values of a, and that have no roundoff 
errors. These transforms are the Number Theoretic Trans- 
forms (NTT) and they look very promising in the evaluation 
of finite convolutions. Their main disadvantage seems to 
be a relation of the sequence length N to the required word 
length that can require Te word lengths for long sequences. 
This section begins by discussing Mersenne and Fermat 
number transforms, that proceeds historically all subse- 
quent work in this field. In part B, other number theoretic 
transforms that provide more choices of word lengths and 
transform lengths than Mersenne and Fermat number transforms 


are discussed. 


A. MERSENNE AND FERMAT NUMBER TRANSFORMS 

Rader [3] suggests performing the calculations of a 
transform with the DFT structure (3.7), in the ring of 
integers modulo a Mersenne number. Such numbers are defined 


by [3] 
p = 27 -ı (4.2) 


where q is prime, but p is not necessarily prime. 
In the ring p = 29-1, a = 2 is a root of the ger order 


since 
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Dee! + 1) = 1 mod 2-1 a) 


Under these conditions, a Mersenne transform of an 


integer sequence lan? having q terms is defined by [3] 


al 
a nK 
Zu ) a 2°) mod p 22) 
n=0 
K = 0,1 q 


Because q has an inverse modulo p [3], the inverse Mersenne 


transform will be 


) A, Sun, mod p (425) 


where all exponents and indices being taken modulo q and 
all operations being performed modulo p in both (4.4) and 
(4.5). 
It can be demonstrated [3] that the Mersenne transform 
satisfies the convolution theorem; that is to say, if 
{X} is the Mersenne transform of er then with Z, = A,-X. mod p, 


k K K 
the Inverse Mersenne transform (Za? of (Zg? is given by 


gq=-1 


Al ) a aaa) mod p (4.6) 


n=0 
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If fa} and {x} are properly bounded [3], 2, is equal 
to the output of the ordinary cyclic convolution with 
gi 


Zr ) a_x (4.7) 


n=0 


Under these conditions, digital filtering of real integer 
sequences can be performed by dividing the sequences into 
blocks, padding the blocks with zeros [34] to prevent folding, 
and aliasing and computing the cyclic convolutions by means 

of Mersenne transforms. 


The number of transform terms can be extended to 2q, 


since & = -2 mod p is a root of order 24: 
ee ee 
(4.8) 
Example: Let q = 7 
Then 
ee, 
and 
a = -2 mod 127 = (-2 + 127) = 125 mod 127. 
Notice that 2q = 14, so 
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„2 = 14 

Now, 

14 = 
Thus 

DE 
but 

LS 

De 

ies e 
and 

os 

Den 


Notice also that the 


14 ~ mod 


14 


(-2) 14 


mod 127 


125 


8+ 4+ 2 


125 125 125 


4 mod 127 


l6 mod 127 


2 mod 127 


2 16 4 mod 127 


128 mod 127 


1 mod 127: 


inverse of 2q 14, mod 127 is 


1025 


1212 2 14 ell 1 


127 = 14 (by 2.21la). 
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By performing some simple calculations, 


ae cen oan rec ja = 14%118 = 1 mod 127. 


Thus, a Mersenne transform exists which has the cyclic 
convolution property for sequences of length N = 2q, with 
a = -2 replacing exp[-217j/N] as the Nth primitive root of 
unity in (3.3), and with all calculations in (3.7) done in 
arithmetic modulo p. 

Rader advocated such a transform since uSing a = 2 
or a@ = -2 as a root of unity would necessitate only shift 
and add operations in computing the transform (3.7). 

Because Mersenne transforms are evaluated without 
multiplications, computation of a time-invariant circular 
convolution (4.1) having N = g points reduces to one multi- 
plication per output sample, as opposed to g multiplications 
With direct calculation. To compute a FFT of a sequence 
of length N = g requires of the order of (N/2) log, (N/2) 
complex multiplications [17]. 

The main limitations of Mersenne transform approach are 
related to the fact that the number of transforms terms 
q (or 2q) is not highly composite, since q is a prime. This 
means that calculations of the transforms cannot be simpli- 
fied by an FFT-type algorithm. 

If one considers p = 2° + 1 and K odd, 3 divides (2* + 1) 
and the largest possible transform is 2, thus one considers 


Only K even. 
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Let K= s ae s odd, t an integer. Then since p = 2k ss 


one has 


= n, nan integer. (4.9) 


And the length of possible transforms will be governed by 


t 
the length possible for 2° +1 (see, 3.15). Therefore 
integers of the form 
yt 
E + ol (4.11) 
CO IZ au 


are of interest. These numbers are called Fermat numbers and 
are defined by F, =P in (4.11) . Fort=0 to t=4 the Fermat numbers 
are prime. For t > 4 there are no en Fermat number prime. 
Number theoretic transforms with a Fermat number as 
the modulus, are called Fermat number transforms (FNT). 
By (3.15), for the FNT of length N to exist N must 
divide 0 (FL = p). 


Notice that (by 3.16), for F, prime 


E 
N = 0(F,) = 2 , b = 2 (4.12) 


and one can have FNT, for any length 


N = 27, m<b (4.13) 


el 


oO 





Example: Let p = F, = 2 41 = 2 +1 = 17 


If one constructs the following table: 


order 1 
order 8 
order 16 
order 4 
order 16 
order 16 
order 16 
order 8 
order 8 
order 16 
order 16 
order 16 
order 4 
order 16 


order 8 





order 2 


(4.13a) 
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one sees that (3,5,6,7,10,11,12,14) are primitive roots 


that will generate the entire field Zi The root a = 2 


7° 
is of order 8 and a“ = 2 s 4 is of order 8/2 = 4 (by 2.20). 
Also note that 11 = /2, in the sense that 11° = 2 mod 17. 


That is, for the ring 2171 one has the possibility of 
choosing a FNT of lengths (1,2,4,8,16) thus satisfying 
(4.12) and (4.13). 

For digital filtering applications the composites 
Fe (b = 32) and Fe (b = 64) Seem to be practical [4]. 


Lucas [6] has proven that every prime factor of a composite 


F is of the form 


t’ 


K 2 + 1 (4.14) 


Therefore, gtte divides 0 (Fi), Tor.’ E. >24, 
22 
Example: Consider F- = 2 + 1 = 4 294 967 297 
= 641x6 700 417 
O(F,) = g.c.d.{ (641-1), (6700 517-1) } 
by 
O(F,) = g.c.d.{640, 6 700 416} 
Fo: 
6 700 416 = 2/ 1300271774239 
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Then 


where t = 5. 


Therefore, for the choice of Fermat numbers with t > 4, 


the maximum possible transform length is given by (see, 3.16) 


ee as 


where 
t> 4, b = 2 


Agarwal [4] proved that 


is a root of order 4b, in Zn E AZ: 
t 


Notice that 


2 a oa 
a2 = 2P/2 (oP _ 9.9P/2 4 1) 
Ba 


Thus 


26/4 („6/2 _ 


- 


db (4.15) 


(4.16) 


2 


- 1) 


29/2 (_ A „B/2, 





a = 2 mod 2 +1. 


Since ae = 2 mod Poe the notation a = /2 for (4.16) 
will be adopted in this thesis following a general procedure. 
Also, note that any odd power of /2 will also be of 


Meder 2°** (by 2.20), i-e. 


ae = Je ; d odd (4.17) 


is a root of order ee And raising a = V2 (4.16) to 


Font m 
2 power one obtains an integer a' of order 2, 


m < t+2, A 


5 (t+2-m) 
Q = (4.18) 
a' being a root of order 2", m < 642, 
2° 4 
Example: Let p = E, = 2 +1=2+41= 17 
L.e. 
ee ee 
Then 
De A AA e aa) 
a = 2(3) = 6 mod 17 
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is a root of order 4b = 4°4 = 16. Observing (4.13a) one 


sees that this result is correct. 


Notice also in (4.13a) that 6 = /2 in the sense that 


a = 2 mod 17. So 


a = yY2 = 6 mod 17. 


If one raises this a to 2 (tt2-m) m= 3 cae = 4 
l.e. 
„(2+2-3) > 
e = Be = 2 mod 17 
m 3 
and a = 2 is a root of order 2 =2 <= 8 (4.18) 


Observing (4.13a) one verifies that this is also a 


correct result. 
If one raises a = 6 to an odd power say d= 5, 
a = 6 = 7 mod 17 is a root of the same order as a = 6, 


namely of order 4b = 16, verifying (4.17). Observing (4.13a) 


one verifies that this is correct. 


Notice that a = 2 mod (2°41) is of order N = 2b, since 


But 
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2 II el mod (2° + 1) (4.19) 


Thus for FNT's with a prime or composite modulus one sees 


that a = 2 (or a power of 2) is a possible root of order 


Mp = 2.2% = 28*!. 


This means that sequences up to a length N = 2b, in 


this case, can make use of FNT. 


? , ; ; ! + 
This 1S a very desirable situation, since N = >= l 


is highly composite allowing an FFT type algorithm and all 


multiplication by powers of a are simple word shifts. 


If a = /2 is used then sequences of length N = 4b = ade 


are possible (4.16). Recalling that Fermat numbers up to 


E, are prime O(F(t)) = 239 (by 3.14), and in these cases one 


can have an FNT for any length N = Dea SO 


Notice that, for these Fermat numbers, a = 3, iS a 
root of order N = 2 (see 4.13a), allowing the largest 


possible length, in the correspondnet ring Z 


Fr 


The following table shows some parameters for several 


possible implementations for FNT's: 


ers N = 2b N = 4b N a for 

t peo E A max y 
max 
2 4 2241 8 16 Te NE 
10,11,12,14 

3 8 2241 16 32 256=2 3 
4 16 216 4 1 32 64 65536-2? 3 
5 32 22 +1 64 128 128 /Z 
6 64 2% 41 128 256 256 /2 
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That is, a = Y2 and the resulting N = 4b give the maximum 
length possible for Fe and Foe However, for prime Fy 
further increases in N are possible up to N = 2? if more 


stages of the FFT algorithm are allowed to have multiplica- 


tions rather than simple word shifts. 


Notice also that besides a = 3 being of order N = De 
there are ee - 1 other integers of order 2°, since: 
ey, 


and the number of primitive roots in Z is given by (see 


Er 
section II) 
1 b- 
UM) = 2701-5) = 25. 
For F, = 17, one sees that the number of primitive 
roots is 
4-1 
2 OOO LA A > 


The cyclic nature of modular arithmetic means that, with- 
Out a priori knowledge, integers cannot be associated with 
Magnitude. For example, the days of the week represent a 
modulo 7 system, so that the statement "Friday is after 
Wednesday" has no meaning unless the week for the Friday 


and Wednesday in question is specified. 
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This means that any number theoretic transform unlike 
the DFT has no physical significance like "frequency." 

It is merely a transform space, the halfway stage in convolu- 
tion. 

During various stages of the computation of an NTT, 
each accumulation of signal "overflows" many times. 

But still the end result of the convolution will be 
exact if the input Signals are properly bounded [17]. That 
is, a dynamic range constraint iS imposed by the modular 
arithmetic. One must be able to bound a priori the result 
of convolution in order to determine the true answer from 
the answer modulo Fie 


The worst case bound is determined by the following 


procedure. If 


Cc = a (Mb 
n n n 


where (N) means circular convolution of length N, and 


Da 
Ja | < 2 
A = 
(4.21) 
b 
IS a2 b 
n o 
then 
b +b, 
le || Ss a (4.22) 
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Example: 
The circular convolution of two sequences requiring 8 
bits plus sign, will require at most, 22 bits plus sign to 


represent the output sequence. Since 


N = 6 = 2° Je] < 2%. 2848 
n = 
6 16 
Ds es ei, EZ 
n =— 


Arithmetic modulo E, can be implemented using b = 2 


bit representation of integers with some provision for 
representing 29. 
Section V deals with the implementation of Fermat Number 


Transforms, where arithmetic is carried modulo F, = 2P + 1, 


t 

Notice that the maximum length of sequences which can 
be cycled convolved using the FNT with a = 2 is N = 2b 
(N = 4b for a = V2), and therefore the length of sequences 
which can be convoled is proportional to the word length 
in bits (b). 

Thus for long sequences the word length requirement 
may be excessive. 

Rader [3] suggested uSing a two dimensional convolution 
scheme to convolve long one dimensional sequences and 


Agarwal and Burrus [17, [18] presented such a two dimensional 
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convolution scheme. Using this scheme, cyclic convolution 
of length N = LP is implemented as a two dimensional cyclic 
convolution of length 2LXP. 

This two dimensional cyclic convolution can be imple- 
mented using a two-dimensional FNT. Then the word length 
required is proportional to the square root of the length 
of the sequences to be convolved [13], which would give 
for a maximum sequence length sb“ rather than 4b (for a = /2), 
ie., for a computer's word length b = 64, the maximum length 


for the transform will be 


A ee 8b = 32 768. 

Appendix A illustrates with an example such a scheme and 
also several other points: treatment of negative values in 
data, the structure of the transform and the inverse matrix, 
negative powers of a, frequent "overflow" during computa- 
tions, meaninglessness of the transform values and exactness 
of the final answer. This example will not demonstrate the 
efficient implementation of the FNT using the binary 


arithmetic. 


B. OTHER NUMBER THEORETIC TRANSFORMS 

Mersenne and Fermat number transforms are-very promising 
for digital filter computation because they can be calcu- 
lated without multiplications. Their main drawback is a 


rigid relationship between transform length and wordlength, 
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caused by the fact that all operations are performed ina 
finite ring with arithmetic carried out an integer M. 
Another difficulty arises because it is not possible to 
achieve simultaneously optimum efficiency in reducing the 
number of operations and in implementing arithmetic opera- 
tions. This is so because Fermat number transforms are 
amenable to a fast transform algorithm, and Mersenne trans- 
forms are not, whereas arithmetic operations can be imple- 
mented more efficiently modulo a Mersenne number than 
modulo a Fermat number [3], [13]. 

In what follows, other number theoretic transforms will 
be described briefly. Such transforms provide more choices 
of word length and transform length, thus enlarging the 
possibility of the use of NTT in digital filtering. 

l. Transforms Over the Galois Field GF (p°) 

Reed and Truong [20] generalized the ideas of 
number theoretic transforms to transforms over the Galois 


Field GF (pô where p is a prime Mersenne number, i.e. 


Pu ee a) See oy ad aa aa ola aLa l a 
(4.23) 
Notice that this is a particular case of the definition (4.2), 
in the sense that here one is interested only in p, a prime. 
Also, the Galois field GF (pS) is a particular case of the 
more general GF (př), where one is interested in the case 


n= 2. 
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Let GF (p^) denote the Galois Field (finite field) of 
p elements, where p = 22 - 1, p.and g primes. Let d be 
a divisor of ps - 1 (possibly d = ps - 1). Also let the 


element r € GF (p?) , generate the cyclic subgroup of d 


elements, 
G Sb po a-r , l) (4.24) 


Then a transform over this subgroup G., can be defined by 


d 
the following pair [20] 
dal 
Kn 
er tor 0 < K < adl (4.25a) 
n=0 
and 
der 
= el -Km u 
a = (a) , A,r“, for 0<m< d-l (4.25b) 
K=0 
Sr 2 2 
where d divides p - 1, a and Ay. are elements of GF(p } 


and r is a generator of the element subgroup Ga 
It can be shown that the cyclic convolution property 
holds for this transform [20]. 


Now, if 


xX = -] mod p (4.26) 


73 





is not solvable, then the nonsolvability of (4.26) is 


equivalent to the statement: 


(-1) is a quadratic nonresidue mod p 


(see Appendix B, eq. B-2). 


By Euler's criterion (Appendix B, theor. B.3), this is 


further equivalent to: 


Eh = ay rn? e 
e (4.27) 
- (-1) (2D -1 
where 
=) is the Legendre symbol defined by [see Appendix 


B, eq. (B.8) ] 


+1 if a is a quadratic residue mod p 


=] if is a quadratic nonresidue mod p 


| 


Thus (-~l1) is a quadratic nonresidue mod p and (4.26) is 


not solvable in GF(p); the polynomial 


Ce (4.28) 


74 





is then irreducible in GF(p). A root say, i, of 
2 


p(x) = x’ +1 = 0 (4.29) 


exists and can be found in the extension field GF (p*) [20]. 


The Galois field GF (p°) is composed of the set 

GF (p?) = {a+ Îb|arb € GF(p)} (4.30) 
where i is a root of (4.29), satisfying 

A == il (4.31) 


(p-1) mod p. 


where -1 
If x? + 1 = 0 is not solvable in GF(p), 1 € GF (p*) 
plays a similar role over the finite field GF(p) that 
y-1 = i plays over the field of rational numbers. 
For example, suppose (a + ib) and (c + id) are elements 


BE GF(p*), then by (4.31) 
Ben e a) = (a = c) 4+ it = a) (4.32) 


ac + ¡“ba + ibc + iad 


(ii) (a + ib) (c+ ia) 
(4.33) 


= (ac - bd) + i(bc + ad) 
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These are the analogues of what one might expect if (a + ib) 


IN 


and (c + id) were complex numbers. In applications to 


radar and communication systems one generally wants to 


take convolutions of complex numbers. 


If (-1) is a quadratic nonresidue mod p, then convolu- 


tion (4.20) of the complex integers can be performed with 


transforms of type (4.25) on a Galois Field GF (p?) where 


a and Dr are restricted to GF (p*). In other words, if 
an! Bo € GF (p^), for n = 0,1,2, ... d-1 the transforms are 
da= 
= Kn 
NS ) a, Y 
n=0 
al 
By = ) Bo yon ’ for K = 0,1, ... d-l. 
n=0 
where r 1S a generator of ad-element subgroup Ga: 
Then taking the inverse transform of 
Cr = Ar , By Tor TK == 0) LL) ce. dl Mt.3%) 
one obtains 
d= 
u -] -Kn 
Se atc) ) E (4.35) 
K=0 


where En € GF (p?) 


and d divides p al 
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If p is a Mersenne prime, the order t of the subgroup 
with generator a, factor as follows [20]: 


~ 


Ben 224 -2-2141 -1 


E A A oe (4.36) 
pe 2atl (929-G-1 Seen 


_ 24tl 2971 =o) 


Since t has the factor d = 2a the usual FFT algorithm 
can be used to calculate transforms of as many d = > 


points. If 
d= 2, ich org tek (4.37) 


and a is a primitive element of GF (p^), then the generator 


of Ga is 
r= a (4.38) 


In (4.25) if one wants to take the transform over GF(p*), 
of parl point sequences of complex integers, a, € GF (p°) 
then one needs to find a primitive element in G q+1 of 

2 


GF (p°). To achieve this, the following theorems are useful. 


For proofs, see [20]. 
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Theorem 1: 
If p is a Mersenne prime and d = a where 
IIS cl then r = a t ib is a primitive 
element in Gg of aoe if and only if 


Era. (4.39) 


Theorem 2: 
For Mersenne primes p > 3, the first 
quadratic nonresidue modulo p in the 


sequence 1,2, 3, .2.., p-l, is 3. 


To find a primitive element in G 


1 of GF (pî), one can use 
2 


q+ 


the following procedure. 


Assume an element r = a + ib is of order S in o 


Now 


ae Bm = (a+ fb) (a + 12 71 (4.40) 
and, it can be proved [20] that 
a 2) ane sy moa p. (4.41) 
Since 
A AAA A 
= ay a a (4.42) 


78 





eee chat G = 2, 3, 5, 7, 13, 17, 19, 31, 61, ... i.e., 


prime. So that (4.40) becomes 


A q 
la + am) 


(a + ib) (a - ib) mod p 


ae + p* mod p (4.43) 


By Theorem 1, it follows that 


ee ee ner cd ena p (4.44) 
since 

d = ga and a = 2%. 
Let 

Ze a? mod 2% - 1 

(4.45) 

Y = -b%mod 21 - 1 
then (4.44) becomes 

X+ l = Y mod p (4.46) 


By đefinition in (4.46), X is a quađratic residue. For 


Y, [see Appendix B, eq. B.ll, and B.9] 
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2 2 
Y = -b = -l, ,b _ -1 
(5) ( D ) =a Co 
also (Appendix B, corollary B.5) 
-1 (24-1-1) /2 oa] 
( ) = (-1) = (-1) = -] 
P 
thus 
Y = X+ 1 is a quadratic nonresidue. 


Hence, one way to choose the ers X and Y is to let X 
and Y be two consecutive numbers from the set of integers 
ee), ces 212, such that the first number X is a square and 
the second is a nonsquare. By Theorem 2, for p > 3, Y = 3 
is a nonsquare and the preceding element X = 2 iS a square. 


Thus it is sufficient to let 


2 mod 21 - 1 (4.47a) 


© 
u 
e 
N 


bf = -X-1 = -Y = -3 mod 27-1 (4.47b) 
To find the solution of congruence (4.47a) one uses the 
following procedure [20]. 


First, notice that [20] . 


Se (4.48) 
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Then by Euler's criterion [Appendix B] 


MAS 


(EZ) 1 mod (24-1) (4.49) 


= 


Multiplying both sides of the congruence by 2, then 


q_ ca 
g((2%-2)/2)+1 _ 22 = 2 mod (29-1) 


Hence 


q-2 
Z 2 2 mod (27-1) (4.50) 


No 
Wl 


Then the solution for (4.47a) is 
2172 


a = +2 mod PRE | (4.51) 


Using the same procedure for (4.47b), b is given by [20] 


G=-2 dei 
b = ¢ (-3) 7 mod (4.52) 


In Ga of GF (p°) there always exists a primitive element, 


r= a + ib [20]. By Theorem 1 (4.39) 
Cee) ee ed a (4.53) 


Raising both sides of (4.53) to the jth power, (4.53) becomes 
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((a + ip) 4/2, 3 = (-1)3 mod p (4.54) 


By Theorem 1, ((a + ip) Y/2, 3 is a primitive element only 
when j is an odd number. The elements are distinct and 
include all primitive elements of Gq: 
In other words, in the cyclic subgroup of Ga of EP(p), 
if (a + ib) is a primitive element then (a + ibi- is also 


a primitive element for j = 1,3,5, ..., d-l. 


Assume & is of order a in EEN If d divides 
2+1 


gar then r = a /d is of order d in ce 
For the transform in (4.25), with d a factor of ye 
3141 3 


one can take r = 4 as the primitive element and d 


as the transform length. 


Example: Find all primitive elements expressed as a sum 


of powers of two in the subgroup Ga of GF (p*), 


shes G@ 28, js So n e a add a. 
To do this, first find an element with order 
Meo = 2°) = G64 in GR(31°). 
According to Theorem 1, if r = (a + ib) is a primitive 


element in G 6 of GF(31%), then 
2 


E 93 
(a + ib) = =-=] mod 31 


By (4.44), that becomes 


e E T mod 31 
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By Theorem 2, 3 is a nonsquare and 2 is a square. By (4.51) 


2772 2? 8 


HI 
No 
HN 
N 
H 


a = 2 8 mod 31 
By (4.52) 


b = (-3) = (-3) = (28) = 


(=a) 


H 


Thus 64 is the smallest positive integer such that 
(8 + i A = 1 mod 31 


An element with order a = 64/8 = 8 is 


8 8 


(@ en iS 6 


‚ 8’ (i20) + (2) 3 


8 5 


A 3 8 4 
TE Oa 


+ ( (120) 7 


8 3 


+ (8) 8° (420)? + (8) 8% 


(120) ° 


+ (8) 8 (i20)’ + (120)° 


(120) 


20 mod 31 


where (1) are the binomial coefficients. Expanding and 


solving, 
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(8 + 1 20) 16 + 10i - 10 - 191 + 9 + 181 - 7 - 5i + 19 


(27 t i4) mod 31 


and since, if (a + ib) is a primitive element in Ga then 


(a + ib)? is also a primitive element for j - 1,3,5, ... d-l. 


One has 


(27 + jay? mod 31 


E3 


(27 + 14)” mod 31 (4 + 14) mod 31 


(27 + AN mod 31 (4 + 127) mod 31 


(27 + 14)? mod 31 Domes 


as the primitive elements in G, for GF (31°). 


8 
In order to perform multiplications by powers of r in 
hardware, it might be desirable to represent the q-bit words 

a and b, where r = a + ib, as a minimal sum of powers of 

two. Then, for example, pn+l mod p can be obtained by 
multiplying r” by r modulo p recursively using a minimal 
number of bit rotations, q-bit word- complements, and 
additions [3]. 


That is, the primitive elements in Gg of GF (317) can 


be written: 
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(27 + 14) a C a EO i a2? 

7 + 44)? mod 31 = 44+ 14 = 27 + i 2? 

Aa a 44 i 27 = 2° + i(2> = 22 = 2°) 

Er + 14) ma 31 = 27 4 127 = (2° -27-29) + 4(27-22-2°%) 


Notice that, r = 4 + 14 has the shortest number of terms of 
power 2. This primitive element is such that multiplications 
by powers of r in Gg of er yield the least hardware. 
Such an r is called the simplest primitive element. 

In order to perform the convolution of two d-point 
sequences of. complex integers a, and Ba with dynamic range 
A and B, respectively, the components of the circular 
convolution Cp = y_ + i So are required to remain in the 


p 
interval [20]. 


p-1 
2 


= p-1 < 6 < 
5 ES Yp’ p Â 





(4.55) 


Specifically to satisfy (4.55) for all sequences aa 7 ti Ba 
and b_ = x, + i y ~ such that Jails 18_| < As and Ix |e 


ly | B, it 1s necessary [20], that: 


n 


JA 


aaB < 22 (4.52) 


If A = B, then by (4.52) the largest value of A is 
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= pri 
A [ Za ] (4.53) 
where [x] denotes the greatest integer less than x. This 
scaling constraint sometimes forces one to choose an 
excessive value p in order to avoid overflow. Such a large 


p implies a computer word length that is often undesirably 


long. 
_ „21 E: 
Example: Let p = 2 - 1] and let d= 2 
By (4.53) 
FƏ 
Aa = 2 21222 
4x2 


If a, and Ba are constrained to the interval 


one is guaranteed to keep the components of ES in the 
interval 
(Dy fe exe Doe = 1,72% 

According to [30] and [29], the Chinese Remainder 
theorem can be employed to develop a ring which is the 
direct sum of certain Galois fields GF (p°). This ring is 
utilized to extend the dynamic range of complex number 


convolutions. 
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A disadvantage of this transform is that multipli- 
cation by powers of the primitive element is not as simple 
as that developed by powers of 2, in Mersenne transforms 
and Fermat number transforms. 

Recently, Reed and Liu [36] developed a high-radix 
FFT algorithm for computing transforms over GF (p*), where 
p is a Mersenne prime. This new algorithm requires substan- 
tially fewer multiplications than the conventional FFT. 

2. Transforms Over the Finite Field GF(p) 

Colomb, Reed and Truong [19] introduces a Fourier- 
like transform in GF(p), where p is a prime of the form 
A, = 3x2" +1. This transform increases the variety of 
methods and the digital word lengths that can be used for 
computing the convolutions of integers beyond the previous 
Fermat or Mersenne number transforms. 

Let GF(p) be the finite field of residue classes 
modulo p, and let the integer d divide p-l. Also let the 
element ye GF(p) generate the cyclic subgroup of d elements, 
G, of GF(p). 


d 


Then a transform over this subgroup Ga can be 


defined by the pair [19]: 


d-1 
ap = Ja. 0<K<ad-1 (4.54a) 
n=0 
d- 
= =] -Kn 
zu ec) 2 AL Y (4.54b) 
=0 
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2 


where an! A, € GF(p) for n = 0,1,2, ... d-1 and (d) 


K 
is the inverse of (d) mod p. Also it can be shown [19] 
that the circular convolution of two finite sequences of 
integers can be obtained as the inverse transform of the 
product of he transforms defined by (4.54). 


Notice that if Pr 1S a prime of the form Sr 


the order t of GF (ph? with generator a, is given by (2.14) 


t = p- 1 = 3x2 (4.55) 
Since t has the factor d = Deis the usual FFT 
algorithm can be utilized to calculate the transform of as 
K 


Many as d = 2 Pones. Ir. d = 2.1 < K < n anda is the 


generator of GF (pa)? then the generator of Ga is (by 2.20) 


3272" = 320" (4.56) 
the basic operations used in the transform defined by (4.54) 
are addition and multiplication mod 3x2" +]. 

Algorithms for searching a prime of the form 3x2” + 1 
have been developed [19], and will not be discussed in this 
thesis. 

As a final remark about this transform one should say 
that the same type of dynamic range constraint applies here 
as in transforms over GF(p*), although in this case GF(p) 


refers to. integer convolutions. A special number theoretic 
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transform that can be computed using a high radix FFT is 
defined [23] over GF(p), a finite field of integers modulo 
primes p of the form (2M=1)2" + 1. Such primes are special 


n 


cases of numbers of the form ¿o Al proposed by Pollard 


[22]. 

If p is a prime of the form p = (27 - ye + 1 
the order t of a primitive root in the finite field GF(p) 
is given by 


Zu) (4.57) 


t = p-l = (2 
Since t has a factor Den one can choose d = Dee where 
1 < £ < nas the transform length. The arithmetic in 
GF(p) with p = (2 - 92 + l is similar to the arithmetic 
in GF(p) where p is a prime of the form 3-2%+1. Addition 
mod p involves at most a triple binary addition [23]. 
Multiplication by a power of 2 mod p can be imple- 
mented by uSing the combination of table lookups and mod p 
addition [23]. However, multiplications mod p still needs 
a full binary multiplication followed by a division [23]. 
This seems a disadvantage when compared with Fermat number 
transforms. The FFT over GF(p) is similar to the FFT over 
the complex number field, except that a root of unity y in 


A a ah integer arithmetic modulo 


Gr (p) replaces e 
p is used. 
It turns out [23], that the number of modulo p 


multiplications required for the evaluation of an FFT over 
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GF(p), p = 2 (2.87) + l, is greatly reduced when compared 
to the number of multiplications required to evaluate the 
FFT over the complex number field. 

It was found [23], that the number A = (2°-1)2"+1 
is prime only for the cases m = 1,2,4,32. Hence only 


transforms over GF(p) when p = Pecan 


+1 have practical 
application in digital filtering. The maximum transform 
length in these conditions is equal to 4 294 967 296! 

It can be shown that high radix FFT can also be 
used to compute transforms over a finite ring modulo a 
number of the form (21) 2741 with m even and m = 1 or 
ra =n, ina Similar fashion for the GF (p) case. The condi- 


tion for such a transform to exist was shown in References 


[22] and [20]. 


3. Complex Mersenne Transforms and Complex Pseudo- 
Mersenne Transforms 


The main limitations of the Mersenne transform 
approach are related to the fact that the number of trans- 
form terms q is a prime. This means that calculations of 
the transforms cannot be simplified Be FFT-type algorithm 
and that the number of transform terms is equal to the 
word size. These limitations can be slightly alleviated by 
using a root (-2) instead of 2 in (4.4) and (4.5), as said 
previously (4.8). The maximum transform length then becomes 
2q. It is also possible to increase the maximum convolution 
Size by resorting to multidimensional convolutions [3], [4], 
[31]. Unfortunately, this result is achieved at the expense 


of increased requirements for computation and storage. 
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By defining Complex Mersenne Transforms, it is 
possible to achieve higher computation efficiency while 
increasing both maximum transform length and convolution 
length. Again, in a Merseene ring, with p = 2a - 1,“ (2) 
and (-2) are respectively roots of orders q and 2q, 
corresponding to transforms of lengths q and 2q, respectively. 

Since q is a prime, 2% ana -2% are also roots of 
q and 2q, provided d is not a multiple of q (by 2.20). 


Notice that (23) is a root of order 4g, Since 
mee? = (12% 4)? = G)* a) = 1 ma 21 - 1 (4.58) 
Also (143) is a root of order 89, since 
ip = pa +j’? 
and 


7 6 .2 


f 8 8 a5 
j + (4) 1 5 + (3) 13 


8 4 .4 
+ (4? 1077 


+j) = 1+) 1 


TO 


7 
+ (5) 17 j“ + (Ë) e 4° 8 


+ (5) 15) +3 


l + 84 - 28 - 563 + 70 + 56} - 28°8j +1 


16 + 03 


91 





Then 


E mod 24] = 1 mod 24-1 (4.59) 


(143) 8 = (2°) = (29)7 = (1) 
The same conclusions can be drawn with respect to (1-j), 
ie., (1-j) is a root of order 8q, in the ring p = > 
MET = 2,3,5,7,13,17,19,31,61,... 

Higher order complex roots do not have a simple 
structure and therefore will generally not be of practical 
interest. 

Under these conditions, a Complex Mersenne Trans- 


form having 4q terms can be defined by [24], 


A = a an >= 


K mod (29-1) (4.60) 


j = v-1 a K = Ol, .../, 4q-1 


Notice that q has an inverse modulo p, and that the 


inverse of 4 is A since 
io > (4j0* mod 29-7 = 2974 (4.61) 


for 


nn Bee i mod Ol). 
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Then 4q has an inverse R such that 


(4GR) mod 27 - 1 = 1 mod 27-1 


and the inverse transform of A, is 
4q-1 
a, = RJ A, A SA] (4.62) 
K=0 


where all exponents and indices are taken, modulo 4q, 
in both (4.60) and (4.62). 
Using a root (j+1) leads to a definition of a 


Complex Mersenne Transform having 8q terms with 


+8 a (143) ” moa 29-1 CES 
=0 
K == 0,1l, s. .7? 8q-1 
and, with R such that (8qR) mod 24-1 = 1, an inverse 
transform 
8q-1 
a, = aR A, (ep mod 29-1 (4.64) 
K=0 
m = 0, 1%, ... — 8q-1. 


with all exponents and indices taken modulo 8q. 
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Computation of a complex convolution by means of 
Complex Mersenne Transforms is carried out as with the 
DFT, with real and complex parts being evaluated modulo p 
separately. In order to avoid errors due to overflow, the 
amplitudes of real and imaginary outputs must be bounded 
to (p-1)/2. This means that usually the word length of 
real and imaginary parts of input sequencds ta}! and E 
is less than half that of output sequences. In other words, 
all computations are carried out modulo p on q-bit words, 
yielding q-bit word outputs, and the input sequences are 
represented by words of length less that (q-1)/2 bits. 
Notice that the calculation of these transforms can be 
partly simplified by an FFT-type algorithm because the 
number of terms is no longer a prime. 

It has been shown [24] that, in the practical range 
of interest for q (where q = 31), substituting Complex 
Mersenne Transforms for conventional Mersenne Transforms 
results approximately in an eightfold reduction in the 
number of operations. 

If the two sequences Da) and ta, ! to be convolved 
are real, the full benefit of using Complex Mersenne Trans- 
forms can be retained by processing simultaneously two 
Successive blocks of sequence {y,} by means of the same 
Complex Mersenne Transforms. This is done by computing the 
complex convolution {Z te of the sequence ta} with the 


auxiliary complex sequence tx, me EN }. The real 


n+8q 
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part tun! and the imaginary part lu) of 12) yield 
respectively the convolution of A and the next block 
Easa? with fa}. 

Up to now, the discussion of this section has been 
restricted to Mersenne numbers, that is, to numbers 
od 


p= - 1 such that pis a prime. If p is not a prime, 


its prime factorization is given by 
(4.65) 


Recall that an m-point real transform having the circular 
convolution property can be defined in the ring of integers 
modulo p, provided m-point transforms can be defined 
separately in the fields Pjr Par ++.: Pje 
This follows directly from the Chinese remainder 
theorem (2.27), and leads to the conditions for the exis- 
tence of an m-point transform in the ring p that m must 
simultaneously divide Ppy7l, P,-1, eres p,-l. When p is 
a prime, the maximum length of the transform is M = p-l. 
Transforms in a ring p, with p nonprime, are there- 
fore proportionally shorter than transforms defined modulo 
a prime number. If p and q are composites with q = q, * dW 
and a prime, a divides p and the maximum transform 
length is governed by that possible for Sn 


This led Rader [3] to consider thatthe only transforms 


of interest in a ring modulo 293-1 were Mersenne Transforms. 
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The situation changes noticeably if one considers 
Comples Mersenne Transforms. Nussbaumer [24] shows that 
given an m-point real transform of root 2 with p composite 


and q, ,m odd integers, one can define 8m-point complex 


transforms in the ring modulo p with 


8m-l jJ = v-l 
e '., wnK (4.66a) 
Az ) a, (1+3) ; 
n=0 K= 0,1,...,8m-1 
8m-1 
È = INR 
a, = (8m) X 2 sen 7 m= 0; 1, see, Bm ll 
=0 


Notice that, the existence of an m-point real transform 
in the ring modulo p implies that m has an inverse, 

R modulo p. Also, the inverse of 8 modulo p, is 

— „473 


8 


since 8x8 = = 27 = 1 mod p 


thus 
(8m) 7? = Din - R mod p exists. 
Table I shows the various possibilities for p 
nonprime and odd q. 


The case of q prime (q = 23,29,37,41,43,47) corres- 


ponds to conventional Mersenne Transforms. When q is not a 
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prime, the corresponding transforms are called pseudo- 
Mersenne transforms. They have a very short length and 
their roots are not powers of 2. Notice that, in order to 
achieve maximum effectiveness in computing convolutions by 
means of pseudo-Mersenne Transforms, it would be desirable 
to have relatively long transforms with a number of terms 
highly factorizable. This does not seem possible with 
transforms modulo p = Sn One notes, however that when 
p is not a prime, with the prime factorization of p defined 
by (4.65), one can define transforms modulo having 
power of 2 roots and such that the number of terms is 


large and highly factorizable. 


These transforms can be defined by 


8m-1 
d. 
A, = (f a. (143) ™) moa p/p. * (4.67) 
K E n 1 À 
n=0 
n=. UE Ls. 2ml 
Jever 


Various possibilities of such transforms are listed in 
Table II. 

It can be seen that the maximum number of terms is 
both large (40 to 392 terms) and highly factorizable, thereby 
leading to efficient FFT type computation with a minimum 
number of operations. It would seem, however, that these 


advantages are offset by the fact that the various operations 
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are performed modulo PE 2. The corresponding arith- 
metic circuits are much more complex than arithmetic 
circuits modulo 29-1 [24]. 


This difficulty can be circumvented by noticing 


that as 


one can compute the convolution modulo p = 24-1 as with 
conventional Mersenne Transforms and obtain the final 
result by performing a last operation modulo P/P; + on 


the convolutions computed modulo p, i.e., 


Zn mod p/p, 3 = (2 mod p) mod p/p, (4.68) 

By proceeding in this fashion relatively long 
convolutions can be computed efficiently by means of FFT- 
type algorithms with all but the last operation performed 
with implemented arithmetic circuits operating modulo (24-1) 
[24]. 

Taking as an example the case of transforms defined 
by q = 25, one can see from Table I that the maximum odd 
length for real transforms computed modulo p = Den is 
15 terms and that the corresponding roots are not powers 
of two. By operating modulo DE it is possible 


to define real transforms having power-of-two roots with a 


maximum odd length increased to 25 terms. The maximum 
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length is then expanded to 200 terms by using complex 
roots. Such a transform can be computed very efficiently 
by means of an FFT type algorithm with a three-stage radix 
2 decomposition, followed by a two-stage radix 5 decomposi- 
tion. 

One limitation of conventional Mersenne Transforms 
is the rigid relationship between word length and trans- 
form length. In this respect, pseudo-Mersenne Transforms 
provide a significant improvement because their maximum 


number of terms Ma is highly composite and any transform 


ax 
length submultiple of Mano can be selected. 
It is even possible to have several transforms of 


identical length and defined modulo integers Pyro see Pj 


i 
that are relatively prime. The convolution can then be 
computed separately modulo Pr «+ Py and then the final 


result obtained modulo (P, "Por" “Py? by the Chinese remainder 
theorem. This approach could, for instance, be used to 
compute a 40-term convolution with an approximate word 
length of 32 bits by means of transforms defined by 

modulo (21”-1)/7 and (2°?-1)/31. 


4. Pseudo Fermat Number Transforms and Complex 
Pseudo Fermat Number Transforms 


This section considers a generalization of Fermat 
Number Transforms, such that transforms having roots which 
are powers of 2 are defined in field or ring which is a 
factor of an integer, p = 2441. With such pseudo Fermat 


Number transforms it is possible to have much more flexibility 
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in selecting desired word lengths than with conventional 
Fermat Number Transforms. In some cases it is possible 
to define complex pseudo FNT's which are well adapted for 
filtering complex signals and allow increased transform 
and convolution lengths when compared to conventional 
FNT's. 

Number theoretic transforms in a ring submultiple 
of p = 2441 are called pseudo Fermat Number Transforms 
[37]. 

If one restricts to roots 2 , one can define an 


M-term pseudo FNT and its inverse by 


M-1 4 
y wnK 1 
ee ) a, 2“) mod p/p; (4.69a) 
n=0 
M-1 
= -wmK oF 4.6 
a BR) A, 2 ) mod p/P, (4.69b) 
K=0 
m = 0,l ... M-1 
d. 
i 


With M-R = 1 mod P-P; `- 
It can be seen that these transforms have the same 
structure as pseudo Mersenne transforms but are defined in 
a ring submultiple of 2441 instead of.a ring submultiple 
of 29-1 for pseudo Mersenne transforms. 
The choice of the particular ring on which pseudo 


FNT's are defined is very important. Usually, p will be 
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divided by its smallest factors, with the remaining factors 
large enough to allow defining long transforms with powers 
of two roots. Pseudo FNT's can be defined to q even and q 
odd [37]. Various possibilities for such transforms with q 
even are listed in Table III. | 

It can be seen that there is much more flexibility 
in word length and transform length selection than with 
transforms defined modulo 2141. 

Here also as in the case of pseudo Mersenne trans- 
forms, performing the various operations modulo 2+1/p; i 
would seem rather awkward, as the corresponding arithmetic 
circuits are much more complex than those operating 
mod (2441). Again the solution is to compute the convolu- 
tion modulo p = 2441 as with conventional FNT's and obtain 
the final result by performing a last operation modulo p/p, + 
on the convolution evaluated modulo p: 


de d. 
= (2 modulo p) modulo P/P; s 


Zn modulo p/P; 
(4.70) 
Assuming the factorization of the number of transform terms 


is given by 
M = M © Mz -M (4.71) 


the pseudo FNT can be computed by a mixed-radix FFT-type 


algorithm. 
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If q is odd, it is possible to increase the maximum 
transform length [37] by using complex pseudo FNT's. The 
existence of complex pseudo FNT's can be demonstrated by 
considering an M term pseudo FNT defined in the ring 


Fl q di 


p/P; = (2 +1)/P; with a root an of order M. 


Notice that if W and q are odd, the condition 
MW = 2q (4.72) 


implies that M is even, and M/2 is odd. Under these 


conditions, (eo) is a root of order’ M/2 since 
M WM/ 2 2 
E A E 
= (-294+2%1) = 1 mod 21+1 (4.73) 


and AS is also a root of order M/2, provided d and q 
have no common factors [37]. This suggests that (25) " is 


a root of order 2M, since 


| | | 4 
(2j)"2™ = (254) 44 = (2%) 4(54)F = (-2)% (17 = 1 moa 2741 
(4.74) 
and (1+3)" is a root of order 4M since 
mej) 4M _ (143)92 = (24% = (21) = (-1)% = 1 mod 21+1 
(4.75) 
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thus a 2M term pseudo FNT can be defined as [37] 


A .. WnK di 35 v-1 
E = í ) a, (23) ) mod p/p; 
n=0 K = Deal: e. è o 2M-1 
(4.76a) 


d: 
As M hasan inverse in the ring P/P; * and 2 is relatively 


prime with p, 2M has an inverse R in the ring P/P; + and 


an inverse transform can be defined by [37] 


2M-1 A 
SR] AL (3) ) mod p-p, ` 


a K (6.76b) 
K=0 


Meme; Ly <6 2M—1 


with all exponents and indices valga modulo 2M. 

It can be demonstrated that the transform satis- 
| fies the convolution theorem [37], and that two complex 
sequences of length 2M can be cyclically convolved via 

complex pseudo FNT's modulo P/P; i, 
In such an approach, all arithmetic operations are 
performed as in normal complex arithmetic with 37 = -l, 
| and real and imaginary parts treated separately modulo p. 
| The final convolution product is obtained by performing a 
last operation modulo P/P; q 


A 4M-points complex transform can be defined by 


using a root (1+3) instead of (23) [37] 
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4M-1 d 
A, = () a y PES mod P/P; x (4.77) 


n 
n=0 


K = 0,1 eee 4M-1 


Higher order complex roots have a complex structure and 
therefore the maximum length of complex pseudo FNT's which 
can be computed without multiplications is 4M in the general 
case, and 8g when W= ] (M= <4) . 

It can be seen that using complex pseudo FNT's 
allows for a given computation complexity, operation over 
transforms and convolution lengths twice as long as with 
real Fermat and pseudo FNT's. 

In particular, when W = 1, the maximum length of 
an FNT is 4q for a root Y2, while the maximum length of a `; 
complex pseudo FNT is 8q for a (1+3) root. 

Various possibilities, q odd, for complex pseudo 

| FNT's are listed in Table IV. It can be seen that for q 
prime, we have complex transforms of length 8q with root 
(1+3). These transforms have a number of terms which are 
not highly factorizable. 
A more interesting case seems to correspond to 
those nonprime values of q, for which it is possible to 
define transforms with a number of terms which are both 
large and highly factorizable and therefore lead to efficient 


computation via an FFT type algorithm. 
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In these respect, the 200-points and 392-points 


iy eee 


transforms defined respectively modulo (2 

741) /3.43 are particularly attractive. 
It is sometimes desirable to compute convolutions 

with improved dynamic range. In this case the same convo- 

lution can be computed modulo several relatively prime 

integers Py: Por -++ BD and the final result obtained 

modulo (py "Pa" *Pj) via the Chinese remainder theorem. 

For this application, the availability of complex pseudo 

Mersenne transforms and pseudo FNT's having the same length 

and defined modulo relatively prime integers is particularly 

interesting. A 200-point convolution could for instance 

be computed with a dynamic range of about 40 bits via complex 


pseudo Mersenne transforms defined modulo Bee and 


via complex pseudo Fermat transforms modulo O A 
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V. IMPLEMENTATION OF FERMAT NUMBER TRANSFORMS 


The best known number theoretic transform is the Fermat 
Number Transform (FNT). FNT are potentially attractive 
for digital filtering applications because they have the 
convolution property (3.5) and can be computed without 
multiplications. 

In principle, such Number Theoretic Transforms (NTT's) 
could be implemented in the same way as Discrete Fourier 
Transforms but with multiplications by trigonometric func- 
tions replaced by multiplications by powers of two, all 
“Operations being performed modulo a Fermat number. 

In practice, however, direct transposition of Fast 
Fourier Transform (FFT) architectures does not necessarily 
lead to optimum implementations and the development of 
Special configurations to computing NTT's seems worth 
exploring. Along these lines, the special attributes of 
the FNT, led several researchers to consider various coding 
techniques for simplifying the implementation of the trans- 
form and special purpose implementations of the FNT. 

This section will discuss these concepts. In part A 
Various coding techniques are presented and in part B the 


realization of the FNT is considered. 


A. BINARY ARITHMETIC FOR THE FERMAT NUMBER TRANSFORMS 


In computing the FNT, arithmetic is done modulo E, = 2P, 
b= 25 (4.11). In this arithmetic the only allowed integers 
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mee {0,1 ... 2°} and all integers whose absolute value do 
not exceed 


b 
ae redet +” l= 2_ = 201 CAD 


can be represented unambiguously. 

Negative numbers are represented by adding 2°47 to 
them; this is similar to twos complement and ones comple- 
ment representation of negative integers. 

Notice that in a b-bit register, all integers from 


0 to 221 can be represented. 


Example: Let E, = 2241 = > = 17 b=2° =2 = 4 


(i) allowed integers {0,1,2, ..., 16 = 2 
(ii) absolute values of number that can be represented 


unambiguously do not exceed 








E u et = ey, = ¿b-1 a „4-1 - 8 
(iii) negative numbers: 
-5 = (-5 +17) = 12 mod 17 
(iv) ab = 4 bit register allows as maximum 
Bo auası eis oa 1 Ds 
2 10 10 


al 








Thus, uSing a b-bit register, the problem remains to 


represent the quantity 2 (in the example above, an = 24 = 16). 


If data is uncorrelated, the probability that this number 


will appear after an arithmetic operation is approximately 


2” 117). 


For digital filter applications, b would typically be 
32 or 64; in these cases the occurrence of oe is extremely 
small. 


In their software realization of the FNT, Agarwal and 


Burrus [4] define a binary arithmetic modulo F, = 2°41, 


ce 
b = a The representation of such a modulus requires 


(b+1) bits, for the representation of the quantity 2° = 1 
mod F,- In order to simplify modular arithmetic, Agarwal 
limits his realization to a b-bit arithmetic. 

This involves some input quantization error where (-1) 
occurs as an input sample, as well as the extremely small, 
but realistic, probability a complete data block in 
error when a (-1) occurs as the result of an ENT computation. 


The following discussion is based on the b-bit represen- 


tation of integers. The various basic arithmetical opera- 
tions can be implemented modulo E, [4]. 
(i) Negation 
b-1 i ‘ 
Betsen =1 32, fa. 27, a. = 0 or 1 (5:2) 
i=0 * 


Then by [4] 
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ewe) a, 2 = fa, 27 = (2°=1) S 
1=0 1=0 
Example: Let A = 8 = (1000), = 8 mod 17 (= eye 
Then 
EMP (oti = (2°-1) = 7 = (5) = 48 


Notice that (5.3) can be arranged in the following form: 


b-1 
A= ja 2-02 
i=0 
b-1 
= J a, 2% - (2°-1) + (2°41) moa F, 
i=0 
b-1 
rer (5.4) 
i=0 


Thus to negate a number, one has to complement each bit 


and add 2 to it. 


Example: Let Fl = 17 
0111 
A = 8 = 1000 -A=-8 = { +10} = 9 mod 17 
1001 


LES 





t 
0111 
and A= 8 = 1000; -8 = +10 = 9 mod 17 
1001 


Example: Let F 24 +d eS 17 
| 


(ii) Addition 
When one adds two b-bit integers, one obtains a 
b-bit integer and possibly a carry bit. The carry bit 
represents 2 = -1 mod F,- 


To implement addition in arithmetic modulo 2P 


+1, 
one simply subtracts the carry bit. Thus the hardware 


should be of the carry subtract type. 





Example: Let FL = 17 
eed 
15 + 10 = 25 = 8 mod 17 ={ + 1010 )= 8 mod 17 
Qhoo1 


(111) Subtraction 
Subtraction is implemented as an addition by first 
negating the subtrahend and then adding terms. Addition 


must be carried out according to (ii). 


Example: Let E, = 17 
13 = 1101 
13 - 4 = 13 + (-4) =Y -4 =(J0100 —1011 
+10 
1101 
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f 





1101 
1101 
010 
zu 
1001 = 9 mod 17 


13 + (-4) = 9 = 





(iv) General multiplication 
When one multiplies two b-bit integers, one gets 
a 2b-bit product. Let Cy be the b-bit low order of it 
and Cu be the b-bit high order part of it, then 
b 


ABR ICO. + E 2 = C, + (=Cj) mod F 


L H L € 


Thus, all one has to do is subtract the higher order b-bit 
register from the lower order b-bit register. The subtrac- 


tion needs to be done according to (iii). 


Example: Let FL = 2° +I = I 


15x10 = 150 = 14 mod 17 








Now 
150 Š (1001 0110, ‚ and by (i) 
10 C e 2 
H L 
Cr = 0110 
Ch = 1001 0110 (=C py) = 1000 
+10 1110 = 14 mod 17 
1000 
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(v) Multiplication by a power of 2 

If x is taken as 2 or a power of 2 (in 3.7), 
the only multiplications involved in taking the Fermat 
transform are those by some power of +2. These multipli- 
cations are particularly simply to implement in arithmetic 
modulo F,- Suppose one needs to multiply A by a 
0 < K < b, all one needs to do is shift to the left the 
content of the register by K bits, and subtract the K 
overflow bits (assuming double b-bit registers). If K is 
outside the range 0 < K < b, one makes use of the fact 
2P = -] mod FY 

Computation of the inverse transform requires 


= multiplications by negative powers of 2 which can be 


converted to positive powers by the following relationship 





oo = BL. ok = -2%É moa Pe (5.5) 
_ nt 2 
Example: Let Fy = 2" + 1=117 
(1) A os mod 7 Eoo 
| Shift left 2 positions 0011 1100 
| C C 
| H L 
C = 0011 — 1100 c, = 1100 
+10 (-C,,) — 
p eur 
i 





1001 = 9 mod 17 
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en 


i 
| 
| 





(ii) iec2e = 13x (-2% ) = 13x (-2) = -26 = 8 mod 17 


13 = 0000 1101 


CH ST 


Shift right &ircularly 3 positions 10910 0001 
G € 
H 


L 
Cr = 1010 0101 Cy = 0001 
+10 (-C,) ORAL 
“Cy = 0111 1000 = 8 mod 17 


For the implementation of the Fast Fermat Transform, unlike 
the FFT, one does not need to store the powers of x. For 
serial arithmetic, one could have a register which stores 
the shift amount K, and as one goes along the Fast Fermat 
Transform flow chart, one continually update the shift 
amount. This realization of b-bit arithmetic involves, as 
previously said, some input quantization error when (-l = 2°) 
occurs aS one input sample, as well as the extremely small, 
but realistic, probability of a complete data block in error 
when a (-1) occurs as the result of a FNT computation. 

| It is, of course, desirable to compute the FNT 
exactly. The difficulties in performing binary arithmetic 


modulo F become apparent when one considers, for example, 


+? 
multiplication or addition in a ring of integers modulo Por 
involving the binary representation of -l = 2°. For example, 


when b = 4, 2941 = 17 the product of 10000 (-1) with itself 


is 00001(+1). 
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A technique for the exact computation of the FNT 
and its implementation are described by McClellan [26]. 
McClellan's approach involves the definition of a new binary 
code representation for the integers modulo FY: 
| Given a binary representation of (b+l) bits, 


A = la, » a_] (5:6) 


an. al 


this new code is described as follows. If 


ay = 1 then A = 0 
(5.7) 
u _ b-1 b-2 
| a, = 0 then A = 0 -1 2 + Oo, 5 2 O 
where 
1 Lf = = 1 
7 = (5.7a) 
-1 if a. = 0 
J 
| b 4 u 2 
Example: Let Fr = 241 = 2+1 = 17, b=4 
10000 represents Zero 
00111 -2° + 2° + 2! + 1=-1= 16 mod 17 
01011 2-22 +2+1= 7 mod 17 
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and 10101 is an illegal combination, Since the only allowed 
number with the most significant bit (MSB) equal to 1l is 
10000 (zero). Consider arithmetic operations using this 
number representation. 
(i) Multiplication by a power of two. 

If the number is zero (i.e., a, = 1) one does nothing. 
If the number is nonzero, the low order b bits are circularly 
shifted to the left a number of places equal to the power 
of 2, and a bit is replaced by its complement as it enters 


the least significant bit position (LSB). 


Example: Let FL = 2°41 = 2741 = 17, b= 4 
Using the proposed code 
E 2 
01100 = 2” +2’ > 2-1 = 9 mod 17 


applying the above rule 


9x2 = 


"y 


4 lower order bits 


9 01100 
x 
2 

01001 


18 = 1 mod 17 = 01000 = 2°-2°-2-1 = 1 mod 17 
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Further, 


9x8 = ? 
01100 
x 
2 
01001 
18 = 0100 
x 
2 
00001 
36 Bi 
2 IN, 
00000 


72 = 4 mod 17 = 00001 = ne = -13+17 = 4 mod 17. 


In a hardware implementation the MSB is used as a control 
bit. If it is one then the number is zero and the rotation 
is inhibited. This is characteristic of all operations 


using this new coding scheme. 


(ii) Negative of a number 
This is done by complementing the low order b bits 
except in the case where a, = l. Again the MSB is a control 


bit that inhibits the operation if it is one. 





Example: Let Fe = 2 +1 = De =. 17/7; bD 


Using the proposed code 
A = 01101 = 2° +2” - 2 +] = 


and by applying the above rule 


o a aaa mod 


(iii) Addition 
If either or both of the operands for 
Nero (i.e., an = l or ch = 1), then there is 


to take place. That is, these special cases 


11 mod 17 


17 


addition are 
no addition 


can be sensed 


and the addition inhibited. Now consider the addition of 


two numbers A and C where A # 0 and C ž 0. 


Let 


a with 
O 


and 


C with 


a 


C 


1 
© 


(5.8) 


il 
© 


Interpret the b LSB's of A and C as the binary 


A MN N N 
representation of A and C, and form the same A and C using 


N 


unsigned binary addition to obtain D. 


21 


a 








That is 


> 


A = 4.4 2 + ar) 

+ 

A b-1 

gerne 
MC=-D=- 4d 2> +4, _, 2 


It is possible to deduce from D, the desired sum 


D = (A+C) mod 2 
Notice that 
A = 2A + 2 [26] 
> = 
Example: Let E, = 241 = 
Using the proposed code 
3 
A = 01010 = 2 - 2 
by (5.9) 
A 2) > 
= 8+ 2 = 10 


Sa [26]. 


+ 2-1 


+ (1) 2? 
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er te a, 
(5.9) 
.. +C 
O 
b-2 
2 + . + do 
(5.10) 
17 
= 5 mod 17 
+ 0 (2°) 





and, 


2A = 2(10) = 20 
+ 
2 
DAR+2 = 20 +2 = 2 = 5 mod 17 = A. 


To deduce from D the desired sum D, verify that 


D = A+C = 2A+2 + 2C+2 = 2A+2C+4 mod 2°41. 


Example: Let E, = 17 
and 
A = 01010=5 “A = (1)2° + (0)2° + (1)2 = 8+2 = 10 mod 17 
Bee c = (1244+ (1)2 = 3 mod 17 
So 
D=2A + 20 +4 = 2(10) + 2(3) + 4 = 30 = 13 mod 17 


and, checking 


D=C+A=8 +5 = 13 mod 17. 
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If D can be expressed as 


D = 2D+2moa 2°41 (5.12) 


with D a b-bit number, then the b-bits of D are the 
b-LSB's of D [26]. 
Example: Let FL = 17 


and 


oO 
N 
oO 
fe 
oO 
= 
o 
lI 
N 
| 
N 
+ 
N 
l 
ER 
Il 
[= 
- 
l 
wn 
ll 


> mod 17 


Since D can be expressed as: 


D = 2D+ 2 mod 17 


where 


TO ee 0)l- 4 4m E O LE a oo 


OI 
ll 


(check: D=2 D+ 2 = 2(10) + 2 = 22 = 5 mod 17) 


the 4 bits of D = 1010 are the 4 LSB's of D = 01010. 


There are two cases depending on the value of dá, in (559). 


MW-=- If d, = 1, then by [26] 


b 


D = 2 +D = p'-1 mod 2741 G 
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Example: Let F, = 17 


€ 
and 
| A = 01010 = 5 mod 17 
C = 01011 = 7 mod 17 
one has 
- 3 2 i ee 
A= {1)2 + (0)2° + (1)2° + (0O)1 = 8+ 2 = 10 
— _ 3 2 1 
C ==(1)2” + (0)2° + (1)2° + (1)1 = 8 +2 + 1 = 11 
— 4 3 2 JE 
D= (1)2 + (0)2” + (1)2” + (0)2” +1 = 21 = 4 mod 17 
Comparing with 
D b b-1 
D= 4 2) +4, 2 Dec 
one verifies that 
dy =: Ap 
in these conditions 
D = 2 + D' mod 17 


Where, by (5.13) 
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Then 
D = 16+5 = 21 = 4 = 5-1 = 4 mod 17, 


checking (5.13). Thus, 


Gee (oN + 2C + 4) mod 2°41 = (2D+4) mod 2°+1 
(5.14) 
D= (2C' + 2) mod 2° + 1 
and 
C= 7c (5.14a) 
Example: Let Fy = 17 
A = 01010 = 5 
= 01011 = 7 
D = A+C = = 12 
one has 
A = 10 
C = 11 
D= AtC = 21 
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SO 


(1) D= 2 (A +C) + 4 mod 2P + 1 

D = 2(10+11) + 4 = 42 + 4 = 46 = 12 mod 17 
(2) D= (2 D + 4) mod 17 

D = 2(21) + 4 = 46 = 12 mod 17 
(3) In the previous example D' = 5 

so 

D = 2(5) +2 = 12 mod 17 


D', in (5.14a), can be verified: 


The condition D 


A = 01010 = 5 
+ 
C = 01011 = 7 


OU 
il 
y» 
+ 
Q 
il 
= 
O 
þ- 





0101 = -8+ 4 - 2 + I = -5 = 12 mod 17 
Since 
D = (0)2> + (1)2% + (0)2 +1 = 5 mod 17 (by 5.12) 
then 


D = D' = 5 mod 17. 
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B= If da, = 0 in (5.9), then [26] 


D 
and 

D 
Example: 

A 

C 


Bince by (5.12) 


D 
and by (5.16) 

m 
checking, 

D 

D 





Z 2D' + 4 mod 2”+1 (5.15) 
= D' +1 (5.16) 
Let Fo = 17, b=4 
= 00 111 = 16 mod 17 
= 01 011 = 7 mod 17 
E 
= 00 010 = -8-4+2-1 = -11 = 6 mod 17 
3 2 al 
= (0)2” + (0)2” + (1)2 + (0) = 2 


Z 2 D' + 4 moå 2°+1 by (5.15) 


2(1) +4 = 6 mod 17. 
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Notice, that, this way of performing addition results in an 
extra level of add, as in the case of 1's complement arithmetic. 
In l's complement arithmetic, the output carry is 
added to the LSB. 
In this new el arithmetic, one takes the output 
carry, complements it and adds it to the LSB. That is, 
this new arithmetic is only as complex as the 1's complement 
arithmetic. | 
There is a small amount of additional complexity due 


to the control bit, but this acts only as an inhibit signal. 








Example: 
(1) 01111 = +8+4+2+1 = 15 mod 17 
00011 = -8-4+2+1 = -9 = 8 mod 17 
ex 
0 
00010 = -8-4+2-1 = -11 = 6 mod 17 
(2) 022172 =7]5 mod 17 
10000 = O 
01111 = 15 mod 17 
MSB = 1 inhibits the addition 
3) 01011 = 8-4+2+1 = 7 mod 17 


(+)00100 = -8+4-2-1 = -7 mod 17 


10000 = 0 mod 17 
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In this last example note that the second add automatically 
produced the control bit indicator, for the special case 
of zero. 

How does one convert from a binary coded representation 
of numbers to this new representation? 

The code conversion between a binary representation 
and this new code falls into two cases. 

Let M be a number which is represented in both codes. 


Let 


Ay... m (5.16) 


be the binary representation of M. 


Let 


m, M_] + m (5.17) 


be the new representation. 

Also, let M be the number represented by interpreting 
(5.17) as a binary code. The conversion rules are as 
follows: 

1) I£M= 0, then Mm, = 1, M = 0, and Mg = 0 for 
Be 2001,22: bel. 
This is a special case and is done separately. 


2) If M # O, then m, = 0 and 


Too, mod 22 F e (5.18) 
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Conversion from the new code to binary is implemented 
by forming 2M + 2 and comparing this sum to 2? 
If the sum is larger than >, then 2°47 is subtracted 


to give the proper binary representation of M. 


Example: Let Py = 17 
M = 01011 new code = 2°-2°+241 = 7 mod 17 
M = 8+2+1 = 11 
By the above rule 
2M+2 = 2(11) + 2 = 24 = 7 mod 17 
and 
7 = 00111 binary representation of M. 


If the binary representation is given the sum 


Mt =- (5-19) 


is formed. If the result is odd, 2°+1 is subtracted; and 
finally this result is right shifted one place. The resulting 


b bits are the b LSB's 


M4 m > DM,» 
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Example: Let E, = 17, b = 4. 


11111 = 14 mod 17 


Given the binary representation of M 
to obtain new code: 
(1) form 


mk Eee 


the result is odd then by the above rule. 


(ii) 
29 = 17 = 12 
(iii) 
12 _ i : 
> = 6 00110 binary representation 
(iv) new code 4 LSB's of the binary obtained in 


(111), i.e., 


new code 00110 = -8+4+2-1 = -3 = 14 mod 17 


Notice that the code described so far, due to McClellan [26], 
is a special case of a more general class of code trans- 
lations discussed by Leibowitz [27]. 

These translations involve the one-to-one representation 


of a number A in the ring of integers modulo F as the 


+? 


binary number corresponding to 


RA - 1 mod FL (5.20) 


where R is any integer in the ring with an inverse [27]. 
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Notice that the code representation of McClellan [26] 


corresponds to the case of [27] 


R = 2°) +1 mod F, ' (5.21) 
Example: Let FL = 2P + 1 = 24 +1, b= 4 
then 

R = 27141 = A = 9 mod F 


Using (5.20), for A 


> mod 17 the code will be given by 
RA-1 = 9(5)-1 = 44 = 10 mod 17 


10 


10 01010 code 


ll 


checking, using (5.7) 
ewes to = 22a =] 10=5 75 mod 17. 


Notice that the simplest code translation corresponds to 
R= 1, for any value of b. Leibowitz [27] concentrates on 
the resulting binary arithmetic for the code translation 
corresponding to R= l. 

Recall that to represent all integers in the ring modulo 
F, requires (b+1) bits. The additional bit is required in 


t 


order to represent the number 2P = -] mod Fy: 
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In order to overcome the problem of performing binary 
arithmetic with this additional bit, one allows the addi- 
tional bit to be a 1 only when the number to be represented 
is a zero. 

One way to do this is achieved by subtracting 1 from 
the normal binary representation of every integer in the 
ring and corresponds to the above set of code translations 


with R = l. 


Example: Let F 2 +1 = 2 +1 = 17, b=4 


Normal value 


Binary representation 


Diminished -1 


value (R 1) 

0 00000 1 

1 00001 2 

2 00010 3 

3 00011 4 - 

4 00100 5 

5 00101 6 

6 00110 7 

7 00111 8 

8 01000 9(-8) (5.22) 
3-3) 01001 10 (-7) 

10 (-7) 01010 11 (-6) 
11 (-6) 01011 12725) 
12(-5) 01100 13 (-4) 

13 (-4) 01101 14 (-3) 

14 (-3) 01110 15 (-2) 

15 (-2) 01111 101-1) 

16 (-1) 10000 0 
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In the diminished -l number representation the b-least 
significant bits (LSB's) indicate the normal value of the 
number. The numbers from 1 to 2 are represented in order 
by the binary numbers from 0 to oe, 

Using this representation, the arithmetic operations 
necessary to perform convolution by FNT's, will be discussed 
next. 

1. Negation 


It can be seen from (5.22) that each of the negative 


numbers ee = 8) is the b-LSB's complement of its 


positive counterpart. 


Example: Let Fy = 17 
Diminished -l value Binary representation 
A = 6 | 00101 
-A = -6 = l1 01010 


Notice that if the MSB is 1, the negation is inhibited. 
Thus, the negative of a nonzero number in the diminished 
-l representation is the complement of its b-LSB's. 
2. Addition 
To perform addition of two numbers represented as 


(A-1) and (B-1) 


(A-1) + (B-1) = (AtB-1) -1 


1735 








and thus 
(A+B-1) = [(A-1) + (B-1)] + 1 (52 23) 


Since the C T bit of the addends is used only to 
inhibit addition if an addend is zero, addition of nonzero 
addends involves only the b-LSB's. 

Equation (5.23) indicates that a 1 must be added 
to the sum of two diminished (-1) numbers to provide a 
correct result. When a carry is generated from the b-bit 
sum, a residue reduction modulo F, requires the subtraction 


1 


of a 1 since 2? = -] mod Fy and no corrective addition is 
necessary. Thus to add to numbers in diminished -1 
representation: 

1) If the MSB of either addend is 1, inhibit the 
addition and the remaining addend is the sum. 

2) If the MSB of both addends are 0, ignoring the MSB, 
add the b-LSB's, complement the carry and add it to 
the b-LSB's of the sun. 

The (oye bit or MSB of the resulting sum is the carry 


out of the pen DEE. 





Example: Let FL = 17,b=4 
=. Diminished -1 value Binary representation 
add 3 00010 
+ 
2 00001 
1 
5 00100 
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aa 7 00110 
+ 

0 10000 

7 00110 

3. Ss le 01100 
+ 

9 01000 

en 

Be. 0 

aa er, 00100 

T add 7] 01010 
+ 

6 00101 

en 

Be 1 

17 = 0 mod 17 10000 


3. Code Translation 
Let B represent the binary representation of a given 
number and D its diminished -l representation. 
To perform code translation from binary to diminished 
-l representation, one does a diminished -l addition of B 


and the binary representation of AS al 27%: 





Example: Let Fy = 17, b= 4. 
Binary number B = 00101 = 5 

2-2] = 01111 

2100 

. 

Diminished -1 value D = 00100 
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i.e., the binary number B = 00101 = 5 is represented in 


diminished -l representation by 


D = 00100 = (4) = (5-1). 


The translation from diminished -l representation 


to binary representation, is performed by complementing the 


MSB of D and adding it to the b-LSB's [27]. 


Example: 


Diminished -l 


O 

it 
> 
‚2 
© 
Ly 





it 
O 
O 
O 
j= 
Ea 


Binary B 





Miele: D en 0 Diminished -l representation 
0 
B = 00000 Binary representation. 


4. Subtraction 
One can perform subtraction in the dimished -l 
arithmetic by negating the subtrahend and adding it to 


the minuend. 





Example: 
Diminished -l Binary representation 
subtract 8 00111 
6 01010 
ieee 
0 
2 00001 
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5. Multiplication by Powers of 2 


In performing a multiplication if the multiplier or 
multiplicand are 0, as detected by the presence of a 1 in 
the (b+1)th bit, the multiplication is inhibited and the 
product is zero. 

To perform multiplication of diminished -1 numbers 


by powers of 2, notice that: 


(A-1)2 


(2A-1) - 1 


and thus 


(2A-1) 


(A-1)2 + 1 (5.24) 


therefore, each multiplication by two involves a left shift, 
ignoring the MSB, and a corrective addition of a 1. If the 


bit shifted out from the pth 


position iS a zero, it is com- 
plemented and shifted into the LSB in order to accomplish 
the addition of al. If this bit is a l, a subtraction of 
1 is also required to accomplish a residue reduction (29 E 
With the corrective addition of +1, these cancel out and a 
0 is shifted into the LSB. 

Thus for each factor of 2, a left circular shift 


of the LSB's is required and the bit circulated into the 


LSB is complemented. 
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Example: Let F, = 17, b= 4 


NGeao) = 2) = Dee x 2x 2x9 = 8 mod 17 


MSB 
vi 
9 LOS 
ls 
2x9 000600 
f 
4x9 00001 
8x9 00011 
16x9 _ 010111 8 moá 17 


6. General Multiplication 


The last operation required to carry out convolution 
with the FNT is a general multiplication by any two 
integers modulo FY. 
To perform a multiplication of the numbers A and 


B represented as (A-1) and (B-1) in diminished -1l number 


representation system, notice that 


(Boy wba) 


AB » (AFB) + 1 


(A-B-1) - (A+B-1) +1 
and the desired result is 
(A-B-1) = (A-l) (B-1) + (A+B-1) - 1 (5.25) 


thus, to carry out such an operation, ignore the MSB's, 


perform a binary multiplication of the diminished -1 
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Beer esentation of A and B, add this result to the b-LSB's 
of the diminished -1 addition of A and B and then perform a 
residue reduction by a diminished -1 subtraction of the 
b-MSB's from the b-LSB's. 

Notice that this particular general multiplication 
scheme is not applicable with code translations other than 
that corresponding to R= +]. As discussed previously, 
if the MSB of either the multiplier or the multiplicand is 
1, then the multiplication is inhibited and the result is 


set to zero. 
Example: Let F, = 17, b= 4. 


Multiply 


143 = 7 mod 17 


binary multiply diminished -1 add 
01100 01100 


01010 01010 


011000 Quito 


0110000 0 


01111000 00110 
binary add tae 


01111110 





residue 1000 


reduction xo 


0 
00110 7 mod 17. 
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An alternate multiplication technique can also be considered. 
Assuming that the numbers are determined to be nonzero and 

a multiplication is required, a translation to normal binary 
coding is performed. Following a binary multiplication, a 
residue reduction by diminished -1 subtraction of the b- 

MS's of the product from the b-LSB's is carried out. The 


result is the desired product. 








Example: Let FP, = 17, b= 4 
multiply 
X 
_11 
143 = 7 mod 17 
translate €OL100 binary multiply 01101 
l 01011 
01101 01101 
01101 
translate (Ao1o 011010 
1 10001111 
01011 0111 
Quio 
0 





7 mod 17 00110 


Since in most cases the translation from diminished -1l to 
binary will be simpler and faster then a general binary 
Or diminished -1 addition, the second technique is the most 
preferable. 

To conclude part A of this section, one can say that 


Leibowitz [27] presents a binary arithmetic applicable to 


P 





the exact computation of the FNT, that this diminished -1 
representation is mathematically simpler than that of 
McClellan [26]. Also the hardware presented in [26] to 
compute the FNT, and discussed in part B of this section, 
can be applicable to this new technique with the exception 


of the code translation. 


B. SOFTWARE AND HARDWARE REALIZATIONS OF THE FNT 

The Fermat Number Transform has been proposed as an 
aid to the rigid and precise computation of convolution for 
digital filtering applications. Since this transform does 
not require multiplications, it is considerably faster 
than the FFT [17]. 

In what follows, a discussion of a software implementa- 
tion of the FNT, due to Agarwal and Burrus [4], as-well as 
the description of a special purpose hardware to compute the 
FNT, realized by McClellan [26], is presented. 

In their assembly language realization of the FNT, on 
the IBM 370/155 computer, Agarwal and Burrus define a 


b 


binary arithmetic modulo F, = 2° +1, b= pie 


E 
Notice that the IBM 360/370 series uses a 32-bit word 


length for fixed-point arithmetic, and therefore is well 
suited for the implementation of convolution using the FNT 
modulo F, = RER 

It has two's complement representation of negative 
integers, i.e., 


(-x) is represented as 232 - X (5.26) 
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Since in arithmetic mod F, One wants negative integers 


to be represented as a complement of 22 +1, eee 


(-x) is to be represented as Dane) (5.27) 


One adds 1 whenever a negative number is encountered in 
data. 

As noted before with b bits, the quantity De = -] mod FL 
has no representation. Thus in the 370/155, one cannot 
represent ze = -l. If a (-1) is encountered in the data 
it is rounded either to 0 or to -2. If data is uncorre- 
lated, the probability that (-1) will appear after an 
arithmetic operation during the various stages of the FNT 
computation is roughly 


-32 ~ -10 


2 10 (5.28) 


To add two 32-bit integers modulo F the logical add 


57 
instruction (ALR) is used which adds the two integers and 
sets the condition code depending on carry or no carry. 

After the logical add instruction, a conditional branch is 
taken depending on the condition code. If the condition 

code indicates a carry bit in the logical add operation, one 
is subtracted from the result, otherwise it is left unchanged. 
This sequence of operations completes one addition modulo Fee 


Similarly, subtraction modulo Fo is performed using a 


logical subtraction (SLR) instruction followed by a conditional 





branch instruction. To multiply a number by 


2° mod F O < K < 32 


5 7 
the number is loaded in the odd register of an even-odd 

fair of registers. The even register is filled with zeros 
and this double register is shifted left by K positions 
using a shift left double logical (SLDL) instruction. 

After the shift operation, the even register is subtracted 
from the odd register, modulo Fo. This sequence of instruc- 
K 


tions completes multiplication by 2° mod Fo. 


To multiply a number by 
modi E: 0< K< 32 ; 


the number is loaded in the even register of an even-odd 
pair of registers, the odd register is filled with zeros 
and this double register is shifted right by K positions 
using a shift right double logical (SRDL) instruction. 
After the shift operation, the odd register is subtracted 
from the even register, modulo Fo. This sequence of 
Operations completes multiplication by 2 mod Fo. To 
multiply two numbers mod Fe requires a somewhat larger 
number of operations. First one determins the signs of 


the two integers and multiply the signs. If any of the 


numbers is detected as negative, its absolute value mod Fo 
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is taken by taking its two's complement and adding one to 
it. Their absolute values are multiplied (MR) to obtain 

a 64-bit double-register product, then the even register 
(higher order 32 bits) is subtracted from the odd register 
(lower order 32 bits) mod Fo. Finally, if the product of 
signs was detected negative, one multiplies the result by 
(-1) by taking its two's complement and adding one to it 
15.27]. 

Assembler subprograms were written [4] to compute Fast 
Fermat Number Transforms and inverse Fermat number transforms 
for an sequence length from > to 2°, taking x as a power of 
2. A decimation in frequency algorithm with normally 
ordered input and bit reversed output was used for the 
computation of the Fast Fermat Number Transform [4]. A 
decimation eins algorithm with bit reversed input and 
normally ordered output was used for the computation of the 
fast inverse FNT. 

For both of these subprograms, the only multiplications 
required were by a power of 2, which were implemented as 
discussed above. Two more subprograms were written to 
compute fast FNT's and inverse FNT's for length - 128 
sequences, using a = Y2 given by (4.16). 

Multiplications required to multiply the two transforms 
are general multiplications mod Fr and are performed as 
discussed earlier. 

To cyclically convolve sequences longer than 128, two- 


dimensional convolution schemes were used. 
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One such scheme was 2 by 128 convolutions for length 
256 sequences discussed in section 10 of [18]. Another 
program was written to convolve long one-dimensional 
sequences uSing two-dimensional FNT, as discussed in 
Appendix B. In Section 6, results of the comparison with 
the FFT are presented. 

The special attributes of the FNT led several researchers 
to consider seriously special-purpose hardware implementa- 
tions of the transform. The machine constructed by McClellan 
[26], is an implementation of a 64-point, 16-bit FNT 
(modulo F, = E + i). This mea system applies, the 
second coding scheme, described in part A of this section, 
to the logic design of the butterfly of the FNT algorithm. 
The fast FNT algorithm implemented is a radix-2 constant 
geometry decimation in frequency (DIF) decomposition of 
the FNT. 

Fig. 1 shows a flow diagram of this algorithm for a 
16-point transform [38]. Although the constant geometry 
structure does not allow in place calculation of the trans- 
form, it simplifies the memory addressing because the addressing 
does not vary from stage to stage. The price one pays for 
this simplification is a doubling of the memory size 
required for the transform. However, 128 Nord per chip is 
a convenient level of integration with ECL (emiter coupled. 
logic), thus making the constant geometry structure 


attractive [26]. 
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FIGURE 1. Flow Diagram of a Radix-2, 16 Point, Constant 
Geometry FFT Algorithm Using the Decimation 
in Frequency Structure 





Fig. 2 is a block diagram of the complete system showing 
the four major subsystems. The computational element (CE) 
is a radix-2 DIF butterfly for the fast FNT algorithm; the 
memory element contains 128 x 17-bit words for use as inter- 
mediate storage during the computation of the transform; 
the control element is a hardwired implementation of the 
fast FNT algorithm [26]; and the input/output (I/O) section 
provides the interface with the fast digital processor (FDP). 
McClellan states that the goal in building this hardware was 
to construct a CE that would operate at a clock rate of 40 
MHz. In order to achieve this speed, ECL 10K circuits were 
used. The basic gate in this logic family as a propagation 
delay of 2 ns, and thus these circuits are well-suited for 
very high-speed systems. Even with such high speed logic 
circuits, two levels of reclock and fast carry addition were 
used in the CE to realize a working system that runs 
reliably at 38 MHz [26]. 

Fig. 3 shows a functional diagram of the butterfly 
which consists of an adder, a subtracter, a rotator, input 
buffer registers Ra and Rg’ reclock registers ES Ry and 
Ry and an output register Rz- Register transfers are made 
at each clock pulse, so that data are always flowing through 
the CE as would be the case in a pipelined fast FNT. 

As the timing diagram in Fig. 4 shows, the output of 
the butterfly is only written into memory from R, at t, and 
t.- During the other clock epochs the contents of R, may 


be changing but this does not affect the algorithm. 
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Recall that in McClellan's code, the rule for addition of 


two nonzero numbers, 


16 715 °°° aI 


and 


B = [b] g by. Bites bo] 1s as follows: 


STEP1: Add the 16 LSB's of A and B with the 
carry in equal to zero 
STEP2: Complement the carry out from step 1 and 


add it to the sum of step l. 


If either A or B is zero (i.e., b or a = 0), then the 


16 16 
carry must be inhibited. Finally if both A and B are zero 
the MSB of the sum is set to one. Fig. 5 shows a realization 
of the addition process. The eure of Fig. 5 is ineffi- 
cient in two respects. First of all, two 16-bit adders are 
required, although the second one is simple because one 

input is zero. Secondly the addition is very slow because 
the carry must propagate through the 16-bit adders. The 

use of carry look ahead (CLA) logic as in FIg. 6 will 

improve both situations. For details of the implementation 
using ECL building blocks see McClellan's paper [26]. 


Notice that the subtracter A-B, can be implemented 


by complementing B and adding it to A. 
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The addition A+B completes the calculation of one output 
of the computational element (see Fig. 3). The result is 


held in the register Ry and then is moved to R, to be written 


Z 
back into memory. For the other output of the CE, the 
quantity A-B is stored in the reclock register R, for 


subsequent rotation by a power of 


a = /2 given by (4.16), 


12 „4 


/Z = 29/4 (90/27) = 216/4(516/2_1) = 2428-1) = 212-2 


as 
(5.28) 
The rotation by V25 is split into two stages. 
In the first stage, the quantity X = A-B is multiplied 


A if K is odd. The Y2 multiplier is merely 


by v2 = A 
a subtracter (5.28). 

A 2:1 multiplexer at the output of the subtracter 
selects whether the input is to be multiplied by /2 or 
by 1, and is controlled by the LSB of K [26]. The result 
of this calculation is stored in the reclock register Sa 
The second stage of the rotation is a multiplication by a 
power of 2, namely [K/2]. ([ ] denotes the greatest 
integer function.) This multiplication is implemented as 
a 16:1 multiplexer controlled by the upper four bits of 
the binary representation of the power of 2. The shifting 


network is followed by a 2:1 multiplexer which selects 


which butterfly output (A+B or 25.y) is to be stored in 


_ 


156 


4 








R, and then written back into memory. This multiplexer 


is controlled by t3 and its output is 


K 


2 - Y) F, +Ws: t 


3 3 


where Y and W are the contents of Ry and Ry respectively. 

McClellan states that the logic for the computational 
element consists of 90 IC's, all of which are located on 
one board (18x16 in.). 

In Section VI, a comparison between the complexity of 
various basic operations involved in Son Sueno Fermat 
Number Transforms vis-a-vis the FFT, in software as well as 
hardware implementations, will be presented. The software 
and hardware implementations discussed there will be those 
described in this section. The results will show only the 
efficiency of these implementations, not all the possibili- 


ties of NTT in general. 
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VI. FERMAT NUMBER TRANSFORM VERSUS FFT 


The FNT provides an efficient and error-free means of 
computing cyclic convolutions. The purpose of this section 
is to compare this method with the standard implementation 
of convolution. Results obtained both by Agarwal and 
McClellan, respectively in software and hardware implemen- 
tation of the FNT, are presented. 

Computation of the FNT of length N requires on the order 
of N log. N additions, bit shifts and subtractions but no 
multiplications. The only multiplications required for 
our FNT implementation of cyclic convolution are the N 
multiplications required to multiply the transforms. This 
is a very efficient technique for computing convolution, 
but unfortunately, the maximum transform length for an FNT 
is proportional to the word length of the machine used. 
Agarwal and Burrus [17] showed that a very practical choice 
of a Fermat number for this application is Fs = A and 


that the FNT mod F. can be implemented on a 32-bit machine, 


5 
namely the IBM 360/370 series computers. 

Suppose one wants to calculate the convolution of two 
sequences x(n) and h(n) having by and b, bit representations, 
respectively, and that the sequence length of both is N. 

Then the output y(n), given by (3.2), would have at the 


most [17] a 


6.1 
b} + b, + log,N (6.1) 
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bit representation. To obtain the correct result b, the 


number of bits of the output, should be [17] 
b > by + b, + log,N (6.2) 


Notice that a better bound on the output can be achieved 
[4]. 

Roughly speaking, one needs twice the number of bits to 
carry out the convolution using the FNT as compared to the 
fixed point FFT implementation of the convolution. But in 
the DFT, every data point is treated as a complex number 
[17] and therefore requires two words, one for the real part 
and one for the imaginary part. Thus in effect the hard- 
ware requirement for two transforms are about the sane. 
Although for real data it is possible to make use of the 
symmetry properties of the DFT, they require extra computa- 
tion and for the purpose of comparison it will be ignored, 
although Agarwal and Burrus had taken this into account for 
their IBM 370/155 implementation. In fact, they assumed, 
in the FFT implementation, each data point is represented 
by a b/2 bit real part and a b/2 bit imaginary part. One 
b/2 bit complex addition is equivalent [17] to two b/2 bit 
real additions, which are equivalent to a b-bit addition 
modulo F,- Thus the complexity of addition/subtraction is 
the same in both the transforms. Similarly, it can be 


shown [4] that a b/2 bit complex multiplication is equivalent 
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to a b-bit multiplication modulo P,- Computation of the 
FNT requires multiplications by a power of 2, which 
implemented as bit shifts and subtractions become much 
Simpler operations compared to complex multiplications 
required in the FFT implementation. 

To compute a length N fast FNT, N log,N additions/ 
subtractions, and (N/2) log. (N/2) "multiplications" by some 
powers of 2 which are implemented as bit shifts and sub- 
tractions. To compute the convolution using the FFT, most 
of the time is taken in computing the complex multiplica- 
tions required to compute the complex multiplications 
required to compute the transforms. A comparison with the 
FNT reveals that these complex multiplications are replaced 
by bit shifts and subtractions which are much faster opera- 
tions. This results in considerable computational savings 
in the implementation of convolution. 

The convolution required to multiply the two transforms 
is about the same for both the implementations. 

To convolve long sequences using the two-dimensional 


FNT (Appendix B), the computational effort increases by, 


at most, a factor of 2 [4]. Still, the FNT implementation 


of convolution is much faster as compared to the FFT 
implementation. 

Fermat number transforms have some additional advantages 
over the FFT. First, the FFT implementation requires storing 
all the powers of a eee 3.7) requiring a Significant amount 
of storage which may be an important factor for a small 


+ 
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minicomputer or a special purpose hardware implementation. 
Second, fixed-point FFT implementation introduces a signifi- 
cant amount of round-off noise at the output, 6-8 bits 
depending on the data [39]. This degrades the signal- 
to-noise ratio during the filtering operations. The FNT 
implementation is error free, the only source of error is 
input A/D quantization. 

Timings for various implementations of convolution and 
their comparison with the FFT implementation are shown by 


Agarwal and Burrus [4]; 


PET FNT 
N . msec msec 
32 16 3.3 
64 31 7.4 
128 60 lO = 
256 123 40.0 ** 
256 | 123 80.0 *** (6.3) 
512 245 16620) 2*% 
1024 530 340 200292 
2048 1260 HADAS = 


* using a = /2 
E* using 2 by 128 convolution 


*** using the two-dimensional FNT 


To compute these timings it is assumed that the transform 
of the h sequence has been precalculated. Thus timings 
are for computing x transforms, multiplications of the 


transforms, and their inverse transforms [4]. 
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For cyclic convolution lengths up to 256, the FNT 
implementation of convolution have a factor of 3 to 5 
speed advantage over the FFT implementation. It takes 
roughly 13-14 usec to compute one butterfly in the compu- 
tation of fast Fermat number transforms [4]. Timings for 
the computation of fast Fermat number transforms, are fairly 
well modeled by multiplying this time by the number of 
Meterflies required in the computation. An assembler 
subprogram was written to compute one butterfly for the FFT 
algorithm [4]. The timing for this was 68 usec, and this 
explains the difference in the timings of the FFT and FNT. 
Agarwal claims that since assembler subprograms were not 
optimized for time, it should be possible to further reduce 
their timings by 10 to 20 percent. Also, to compute one 
butterfly of the fast FNT's, three add/subtract logical 
instructions are required and since the arithmetic is done 
mod 27741, these three instructions are followed by three 
branch instructions (to take into account the carry bit). 
If the hardware was designed to do arithmetic mod En 
these instructions could have been avoided resulting in a 
Significant reduction in the computation time [4]. One 
objective of the construction of FNT convolver by McClellan 
was to evaluate the total system cost of an FNT convolver 
versus a pipeline FFT convolver, primarily in the speed 
regime applicable to radar signal processing. 


The signal bandwidths encountered in radar signal 


processing (10-30 MHz) require a pipeline architecture for 
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either the FNT or FFT [40]. Typically, the overlap-save 
version of high-speed convolution is employed for real-time 
processing of the radar returns and a 50 percent overlap of 
the input data is common [26]. Furthermore, the length of 
the convolution to be implemented is assumed to be large 
(e.g., 512 or greater). Two cases will be considered: 
a length 1024 convolution of real data and a length 1024 
convolution of complex data. 

Four measures of hardware complexity are the basis of 
parison [26]: 

- the number of butterflies for output point, 

- the number of reference spectrum multiplies for 

output point, 
- the total amount of interstage delay line memory in 
the forward and inverse transforms, | 

- and the total amount of reference spectrum memory. 
The FFT implementation will be considered first. For either 
real or complex data, it is assumed that the FFT implemen- 
tation employs an ll-stage radix 2 pipeline FFT in both the 
forward and inverse transform. Notice that for real data, 
it is possible to do a length N transform with one length 
N/2 transform and some. overhead to combine the real and 
imaginary parts [38]. However the overhead amounts to an 
additional butterfly so there is little, if anything to gain 
by using this fact in a pipeline FFT [26]. 

11 


For the case of a length 2 = 2048 pipeline FFT convolver 


the number of butterflies per output point is 
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2 log,N = 22 (6.4) 


assuming 50 percent convolution overlap. Likewise, two 
reference spectrum multiplies must be done per output point 
[26]. The amount of interstage delay line memory can be 


calculated from the formula [26] 


IDM 


(r+1l) (6.5) 


ul 


where r is the radix of the transform. 


| Thus, for two radix-2 pipelines, the total is 
IDM = S(2+1)-2 = 3N = 6144 = 6K 


words of memory. - Finally, the reference spectrum requires 
2K words of memory [26]. A pipeline FNT structure is 
identical to the pipeline FFT except in the butterfly where 


rotation by e121TK/N 


(in the FFT case) is replaced by multi- 
plication by V25. 

Thus many of the results quoted above are applicable to 
the FNT. Since the FNT naturally processes real input data, 
the cases of real and complex convolution require different 
realizations. In both cases, however, a two-dimensional 
implementation of the convolution is required [31]. 


The length 1024 convolution of real signals can be 


implemented with 64x64 transforms [26]. 
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McClellan claims that for the FNT convolver processing 


real data, the following results apply: 


total number of butterflies = 9x27 36 butterflies per 
output point 

total reference function multiplies = zo 4 multiplies per 
output point 

total interstage delay memeory = 6-4 K 

amount of reference function memory = 4K. 


If the signal to be convolved is complex then one possible 
implementation is to handle the real and imaginary parts of 
data separately. The number of butterflies per output 

point, the interstage delay memory, and the reference spec- 
trum are all doubled. However, the number of real multipliers 
for the reference function multiply iS quadrupled because 

the multiplications are now complex [26]. These results 


can be summarized, assuming 1024 convolutions, by 


FFT Real or FNT Real FNT Complex 
Complex Data Data Data 
Butterflies 22 36 72 
Reference 2 2 16 
Multipliers 
Interstage 6K 6-4K 12-8 K 


Bu Memory complex words a real words 2 real words 


Reference 2K 4K 8K 
spectrum 
Memory complex words real words real words 





a One complex word will contain approximately 27 bits 
for typical high-precision radar applications. 


b One real word contains 33 bits for FNT. 
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Notice that the FNT always requires more memory and more 
computational elements than the FFT. Hardware savings are 
possible because most of the hardware cost of the FFT is 
concentrated in the butterfly elements (up to 80 percent) 
[26] and because the FNT butterfly requires from one-third 
to one-sixth the hardware of an FFT butterfly. These 
remarks apply to the FNT where the data to be convolved are 
real, but where the data are complex, the situation becomes 
much worse because all measures of hardware complexity are 
increased by a factor of 2 or 4. 

McClellan conclude that the FNT is a useful alternative 
to the FFT if the signal to be filtered is real and the 
computational elements are a major part of the overall 
system cost as in a pipeline architecture. Furthermore, 
for short length convolution (e.g., length 64) and for 


two-dimensional convolution the savings may be significant. 
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VII. CONCLUSIONS 


It has been shown that, by working in a finite field or 
ring of integers modulo M, a large class of transforms 
exist that have the cyclic convolution property, i.e., the 
transform of cyclic convolution of two sequences is equal 
to the product of their transforms. These transforms are 
called Number Theoretic Transforms (NTT's) and they are a 
computationally efficient approach to performing the dis- 
crete convolution function. These NTT's are truly digital 
transforms, taking into account the quantization in amplitude 
and the finite precision of digital signals. They bear the 
same relation to digital signals as the DFT does to discrete- 
time or sample data signals and the Fourier or Laplace 
transforms do to continuous time signals. 

When working with digital machines, the data are avail- 
able only with some finite precision, and therefore, without 
loss of generality, the data can be considered to be inte- 
gers with some upper bound. To compute E in this 
digital domain, operations in the complex number field of 
the continuous domain can be imitated in a finite field, or 
more generally, in:a finite ring of integers under additions 
and multiplications modulo some integer M, with an integer 
a of order N, replacing = Us in the Discrete Fourier 
Transform (DFT). 

In this ring, when two integer sequences x(n) and h(n) 


are convolved, the output integer sequence y(n) is congruent 
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to the conventional convolution of x(n) and h(n), modulo M. 
In the ring of integers modulo M, conventional integers 
can be, unambiguously, represented if their absolute value 
is less than M/2. If the input integer sequence x(n) and 
the filter sequence h(n) are so scaled that |y(n)| never 
exceeds M/2, one would get the same SED implementing 
convolution in the ring of integers mod M as that obtained 
with normal arithmetic. 

By special choices of the length N, the mod M, and 
the value a, it is possible to have transforms that need 
only word shifts and additions but no multiplications, 
that have an FFT type algorithm, that do not require storage 
of complex values of a and that have no round-off errors. 

It has been shown that Mersenne transforms with 
M=p= 24-1, q a prime, and a = -2, have the transform 
length equal to N = 2q and therefore do not have an FFT 
type fast computational algorithm. 

The best known number theoretic transform is the 
Fermat Number Transform (FNT), where M = Be t a positive 
integer. For FNT's with a prime or composite modulus it 
was verified that a = 2 or a power of 2 is possible, for 
sequences up to N = A This is a very desirable situa- 
tion since N is highly composite allowing an FFT type 
algorithm and all multiplications by powers of a are simple 
word shifts. If a = /2 is used then sequences of length 


t+2 


N = 2 are possible, thus increasing the maximum sequence 


length permissible. 
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Assembler programs on the IBM 370/155 computer, written 


by Agarwal, showed that for cyclic convolutions length 


up to 256, 


factor of 


the FNT implementations of convolution have a 


3 to 5 speed advantage over the FFT implementa- 


tions. The reasons for the speed up are: 


l - The Fermat number transform requires no multiplica- 


tions and, therefore, the implementation of 


convolution requires only N multiplications for 


an N point convolution. The number of additions 


and subtractions (together) for a convolution is 


2N log,N and there are N Log, N required "multipli- 


cations" by a power of two. 


2 - Only real operations are required. This buys about 


two to one savings over the FFT requirements. 


3 - The Fermat number transform is able to compute an 


exact convolution thus allowing a program to avoid 


the need for either floating point arithmetic or 


overflow checks or other precautions. 


The computation required to multiply the two transforms is 


about the 
sequences 
increases 


mentation 


same for both implementations. To convolve long 
using two dimensional FNT, the computational effort 
by, at most, a factor of 2. Still the FNT imple- 


of convolution is much faster as compared to the 


FFT implementation. 


Fermat number transforms have some additional advantages 


over the FFT. First, the FFT implementation requires storing 
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all powers of qa requiring a significant amount of storage 
which may be an important factor for a small minicomputer 
ora special purpose hardware implementation. 

Second, fixed-point FFT implementation introduces a 
Significant amount of round-off noise at the output 6-8 
bits depending on the data [38]. This degrades the signal- 
to~noise ratio during the filtering operations. The FNT 
is error free, the only source of error is input A/D 
quantization. 

In the realm of radar signal processing the potential 
for higher throughput is worth exploring. For this reason 
McClellan designed and constructed a small prototype FNT- 
convolver. An important element in the design of the FNT- 
convolver was a new coding scheme for the data, although 
Simpler codes are possible. The experience derived from 
designing and building this hardware serves as the basis for 
estimates of the site of large pipeline FNT convolvers, for 
use in radar matched filtering applications. The result of 
the hardware comparison of the FNT versus a pipeline FFT, 
indicates that the anticipated savings of the FNT can be 
realized for small systems (e.g., length 64 convolution), 
when the signal to be filtered is real. However, in larger 
systems where nee use two-dimensional convolution to 
implement one dimensional convolution, the savings in 
multiplier hardware are offset by increased transform size 


and the corresponding increase in memory size and reference 


170 








spectrum multiplier hardware. In this case, when the signal 
to be filtered is real, the FNT still offers a potential 
Savings in the amount of hardware versus the FFT. When the 
Signal is complex valued the amount of FNT hardware approxi- 
mately doubles and is much greater than the pipeline FFT. 

In the implementation of two-dimensional convolution there 
is no penalty due to increased memory size and the savings 
in multiplier hardware will translate into savings for the 
Overall convolver system. 

It was shown that the main drawbacks of' FNT's is a 
rigid relationship between word length and the obtainable 
transform length and a limited choice of possible word lengths. 
This last point is especially significant, since FNT's are 
restricted to word sizes equal to q = 2 t an integer. As 
gq increases very rapidly witht, the choice of possible word 
lengths is very limited, and most practical digital filtering 
applications, when implemented with FNT, are constrained to 
word lengths of 16, 32 or 64 bits. If the dynamic range 
required for convolution does not correspond to q = 2 
choosing the next higher value of q may result in a signifi- 
cant waste of computing power. 

Various solutions to these problems involving either 
two-dimensional techniques, the use of "the Chinese Remainder 
Theroem", or other NTT's has been discussed. 

Along these lines, it was shown that transforms over 
the Galois Field GF(p*), can be found which do not introduce 


round-off errors and which can be used to compute 
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information-lossless convolutions of sequences of complex 
numbers, by a FFT type algorithm. A disadvantage of this 
transform is that multiplications by powers of the primitive 
element (a) is not as simple as in Mersenne or FNT's. 

It was verified that a Fourier-like transform in GF(p) 
where p is a prime of the form An = A ee n a positive 
integer, is possible. This transform increases the variety 
of methods and the digital word lengths that can be used 
for computing the convolution of integers beyond the above 
mentioned Fermat or Mersenne Number Transforms. Also, a 
special NTT that can be computed using a high-radix fast 
Fourier type algorithm, defined on arithmetic modulo primes 
Of the form DA was discussed. 

Complex Mersenne transforms that can be computed without 
multiplications were presented. These transforms are very 
promising for computing convolutions because they can be 
partly computed with the FFT type algorithms and some of 
the operations can be performed on words of reduced length. 
Complex Mersenne transforms also have the advantage of 
permitting operations on transform lengths and convolution 
lengths up to four times longer than is possible with 
conventional Mersenne transforms. 

These results have been extended to cover the case of 
transforms oerating in a ring modulo a pseudo Mersenne 
number or submultiple of such a number. It was verified that 


some of these transforms have a highly composite transform 
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length and therefore can be computed with an efficient 
FFT-type algorithm. In the same way pseudo FNT's defined 

in a ring which is a submultiple of a Fermat number and 

can be considered as a generalization of FNT's. allow a much 
wider choice of possible word lengths, and therefore are 
well adapted for evaluating convolutions. As an extension 
the case of complex transforms was considered. 

Finally, complex pseudo FNT's were presented, that 
allow a length double that of FNT's, and part of the calcu- 
lations to be performed on words of reduced length. Some 
of these transforms have a highly composite number of terms 
and are therefore well suited for computing complex convo- 
lutions with an efficient FFT-type algorithm. 

Recently [32] a number of three bit primes have been 
discovered which make possible very efficient fast number 
transforms approaching that of the FNT, but permitting much 
larger transform lengths suitable for convolving the large 
arrays met in picture processing and electron microscopy in 
particular. If a method of implementing arithmetic modulo 
three bit primes comparable to that already developed for 
implementing arithmetic modulo a Fermat number, could be 
found, very efficient fast convolution would become possible 
for a very large range of array sizes. 

Rader and Brenner [41] have introduced an alternative 
form for the FFT. This new form has the advantage that none 
of the multipliers is complex, but in the usual complex 


field, most are pure imaginary. It has the disadvantage 
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that one must divide by very small numbers and therefore 
aggravate any quantization noise problems. This new 
algorithm may be applied to the complex NTT without this 
disadvantage. 

Winograd [42] has shown how to perform a short-length 
D.F.T. of length N= por př (where p is a prime) in a very 
efficient way. Winograd's algorithm uses the fact that 
a = l and requires that all operations be performed in a 
field. Hence, providing one chooses the modulus M to be a 
- Prime number, one may perform the NTT by using Winograd's 
algorithm. 

A final remark is in order. Whether or not an engineering 
method is useful becomes clear only when that method is 
evaluated by the wide community of potential users, who 
consider it in relation to their needs. Since so many 
engineers and programmers are not familiar with number. 
theory it is an open question whether NTT's algorithms will 


ultimately prove to be of great or small importance. 
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APPENDIX A 


TWO DIMENSIONAL CONVOLUTION FOR CONVOLVING LONG SEQUENCES 


Arithmetic mod Fy (a Fermat number) can be implemented 
using b = >= bit representation of integers. In Section IV, 
it has been shown that the maximum length of sequences which 
can be cycled convolved using the FNT with a = 2 is N = 2b 
and therefore the length of sequences which can be convolved 
is proportional to the word length in bits. Thus for long 
Sequences the word length requirement may be excessive. 

Rader [3] suggested uSing a two dimensional convolution 
scheme to convolve long one-dimensional sequences and Agarwal 
and Burrus [17,18] presented such a two dimensional convolu- 
tion scheme. Using this scheme, cyclic convolution of 
length N = LP is implemented as a two dimensional cyclic 
convolution of length 2LxP. 

This two-dimensional cyclic convolution can be implemented 
using a two dimensional FNT. Using this two dimensional 
scheme, the word length required is proportional to the 
Square root of the length of the sequences to be convolved 
which would give for a maximum sequence length gb? rather 


than 4b. I.e., for a computer word length b = 64 
N for a = /2 would be 32768! 


In the following pages an example illustrative of the two 
dimensional convolution using arithmetic modulo a Fermat 
number and FNT's is worked out. 
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Let x(n) and h(n), n= 0, 1, ... N-l, be two sequences 
which need to be cyclically convolved. Let N = LP. 

We construct two (2IxP) two-dimensional sequences 
h(i,j) and x(K,2) from x(n) and h(n) respectively as 


shown below. 


h(i,j) = h(jL +i - Lb) (1) 
ae Ot aaa: 


J = Ol; eee gs pI: 


And 
x (2L+K) K = Ol; e. .».y} L-1 
x(K,2) = (2) 
0 KR = L,L+tl, 222; 2L-1 
Qe = OG. 0 
Example: 


Two dimensional convolution using FNT's. 


Let 


x(n) = (2,-2,1,0) and h(n) = (1,2,0,0) 


Using Fermat number transforms modulo Fy = 17, 


x(n) = (2,15,1,0) and h(n) = (1,2,0,0) 
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Note that the second element is changed from -2 to 15. 
Notice that N= 4 = LxP = 2x2. 

One constructs two (2L xp) two dimensional sequences 
m and x (K, 2) from x(n) and h(n), according to (1) 
and (2). 


It turns out 


0 1 2 1 
x 0 2 ~ 15 0 
h(i,j) = x(K,2) = 

1 0 0 0 

2 0 0 0 


Now taking two dimensional transforms of h and x yields H 


~ 


and X. 


p-1 2L-1 
u ; u Br im jn 
H(mñ) = )  ) h(i,j)a, ar (3) 
j=0 i=0 
mM = 0,l, -zp 2L-1 Xor, 7 is an element of order 2L 
O cean Pl ap 7 is an element of order p 
and a similar expression for x(m,n). 
Remembering that one is using FL = 17 as the modulus 
= =. = = 16 
Gyr, Ay 3 and Ao Ao 
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This can be found by noting that 3 and 6 primitive roots 
of the ring in consideration will generate all the positive 
integers in the ring (2.16). 

Also, by (2.20), if a is a root of order N then 


a is of order N/K if K|N (K divides N) 


ar is of order N if N and K are relatively prime. 


Thus, With these considerations in mind one finds 


= 16/4 _ 4 _ 
Gor = Oy 3 = 3 = 13 (mod 17) 
a = 13 order 4 
a = a 316/2 o 
p 2 
a = 16 order 2 


Now one can return to the calculation of the two dimensional 


transforms. The first term will be 


(0,0) 136) (9) 4669) (0) , 50,1) 1360) (0) 16 0) 


H(0,0) 


(1,0) 132) (9) 4669) (0) na, 13 62) (9) 76 (2) (0) 


+ 


(2,0) 1362) (9) 3660) (0) 4 he, 1362) (0) 46 2) (0) 


+ 


+ 


h(3,0) 13 (3) (9) 4660) (9) | h(3,1) 1363) (0) 4,604) (0) | 
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(0,0) 


= h(0,0) @ ni) es = o+1+ = 6 (mod 17) 
+ h (1,0) + h(1,1) + O+ 2 + 

+ h(2,0) + h(2,1) + tO 

(Sy Oe h(3,1) 2 + 0 


Now constructing a table that will help in the evaluation of 


M(m,n). 
N= 0 Í 2 3 4 5 6 7 8 9 
1 N 1 bB e z 1 13 
notice order 4. 


N 
6. = 1 16 1 16 1 16 1 16 1 16 
Bee 
notice order 2 


Now proceding: 


H(1,0) = h(2,0) 1362) (1) 4609) (0) 4 neg zy) 1369) (2) 16 (0) 


e 


¿nao 1380 0D ¿00 ¿0 1300) ¿(0 (0) 


since h(0,0) = h(1,0) = h(2,1) = h(3,1) = 0. 
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0) 


H(1,0) = 1 Lee ter El ie e a LO) pele 0 mod 7 


213% 16% +2 131 160) E 


FOr 
H(2,0) = (2,0): 13'2) (2) 4669) (0) y n0,1) 1360) (2) ag (1) (0) 
+ n(3,0) 1363) (2) 4669) (0) +, 211,1) 13902 ¡¿010) 
= 7 13°*) 16°) 44 13°) 169) = 141+ = 15 (mod 17) 
Be onen er 
and, 
eo A A A E A aie 2282) 


E ee ee 


(0) DAD 


A euere ad) 


+ 2137 16% + 2 137 16% + 26 +8 


For the second column, 
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H(0,1) = n(2,0) 13 (Y (0 ¿DD y no) 1300) 46 (1) 2) 
+ n(3,0) 1363) €) 46) 0D ¿21191300 16 (9 
= fea) Gr 2 MV atmodı 17) 
+ 2+ 32 
A A 3 (8) ) gg (2) 
+ 213) (D ¿00 y) 300 ¡¿00) 
= 16+16 = 14 (mod 17) 
+ 8 + 416 
H(2,1) = OO 16 69) (1) , 3 13000 (2) ,6(1) (1) 
t 2.133) (2) 7600) (1) , , 13000 (2) ¿(0 0) 
= 1+16 = 0 (mod 17) 
+ 32 + 512 
H(3,1) = 1390 ¿00 4113 6) ¡¿000) = 16+16 =16 


(mod 1' 


E II ig ae 
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i.e., 


6 0 

a 0 14 
H(m,n) = 

15 0 

lo 16 


two dimensional 


Fermat transform 


The two dimensional transform of 


2 1 
A 15 0 
X = 
0 0 
0 0 
will be given by 
p-1i 2L=1 
X(m, n) = ) ) x(i,j) Toi a 
j=0 i=0 


So 


xc0,0) = (0,0) 1307100) 160100) , (0,1) 13 


(1,0) 1301) 00) 46 60) (0) 


+ 


LALO 1 (mod 17) 


182 


(0) (0) 4¢ (1) (0) 





x(1,0) 


x (2,0) 


x (3,0) 


x (0,1) 


x(1,1) 


O A ay GEO) eG) 


(0) (0) 
x(1,0) „DPD 16 
2+ 1 +15(13) = 11 (mod 17 


x(0,0) 1369) (2) 4600) , 6g 4) 13 (0) (2) 4661) (0) 
x (1,0) 13%) (2) 46 (0) (0) 

2+ 1+ (15)(16) = 6 (mod 17) 
2 1369) (3) ,¿(0) (0) | (1) O) 


2 + 1 + (15)4 = 12 (mod 17) 
O EN 


ea lo Ko 


2 +16 + 15 16 (mod 17) 
(1) (1) 46 (0) (1) 


(15) 13 


2 +16 +8 = 9 (mod 17) 
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(OO 
= 2+ 16 + (15) (16) = 3 (mod 17) 
x(3,1) = (2) 1310) (3) 1600) (1) + (1) 13 (0) (3) 16 (1) (1) 
ra. one 
= 2+ 16+ (15)(4) = 10 (mod 17) 
So 
A 16 
~ TI 9 
X(m,n) = 
5 3 
| 12 10 


two dimensional 


Fermat transform 


Let y(i,j) be two-dimensional cyclic convolution of 


"a 


x and h sequences, then it can be proved that two dimensional 


"a mg 


transform of y is the product of H(m,n) and X(m,n) [17], 


defined by 


p-1 2L-1 
e E 1 2 
n=0 m=0 


i = 0,1, +. o 
a PL 


0,1, 


Li. 
ll 
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) H(m,n)X (m,n)a, 


-jn 


| 





Coming back to the example: 


-y (0,0) H(0,0)X(0,0)137 60) (9) 467 (0) (0) 45g ay x69, 19197 (0) (0) 4 6- (0) (0) 


+ 


+ H(2,0)X(2,0)137 (9) (2) 19> (9) (0) seca ay x (2,2) 137 (0) (1) yg (0) CL) 


+ 


y(0,0) = (6) (1) (1) (1) + (14) (4) (1) (1) + = 10 (mod 17) 


IA LO AO AE) 


y (1,0) = (6) (1) 137 (1) (0), 67 (0) (0) + (14) (9y137 (1) (1) 5 67 (0) (1) 
Br OR een a Kae 


il 2 3 


6 e106. 1s meee 75. be? 49 GO ee 


= 6 (mod 17) 


Notice that in this last calculation use of the following 


table has been made: 


Eu E mo ee: 
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So 


lat = 4 also 16+ = 16 
order 2 
13° = 4° = 16 | 16° = 1 
Bea = 23 E 
Be a {l 16% = 1 
13? = 4? = 4 order 4 PS e 
ERA? = 16 eae =e 
ea a = 13 e =e cil 
E ee a= 
imo = 4 = 4 | 16 = 16 


Continuing with the example: 


~~ 


(6) a) 1372) (0167 (0 3 aao) 1372 (P1670 0) 


y (2,0) = 
+ (15) (5) 137 (2) (2) 67 (0) (0) T (16) (10)137 (2) O) 167 (0 (1) 
= 16 (mod 17) 
MS o y 1 DIO DIO aaan o 13 O Da O) 
= 16 (mod 17) 
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and for the second column, 


y (0,1) 


= (6) (1) 13769) (9) 967 (2) (0) sy 13 (00 (19,67 (1) (1) 
+ (15) (5) 137 DM, 16) 1013 08000) 
= 16 (mod 17) 

van = (6) (1) 137P (167 (0 sy 9) 137 2 (00), 67 00) (1) 
¿ass 13 DRDO, naaa DY 167D 
= 16 (mod 17) 

y(2,1) = (6) (1) 137 (2) (9) MO a) 13 (2) C4 gg CL) 41) 
* (15) (5) 137 2) (2) 467 165 1013 Y Bj 01 (1) 
= 10 (mod 17) 

Meri] (6) 17 IO aay (9) 137 82) 6 
+ as o DRAM (26) (0) 137 3) (F) p> HD) 
= 16 (mod 17) 

Finally, 
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10 16 


P 1 16 16 
y (1,3) = ADT 
O lc 10 
16 16 
and since 
i Es 1 = ein e = = 
BL = ON (2N) = (8 ~) = 15 (mod 17) 
we have 
10 16 14 2 
e 16 16 2 2 
16 10 2 14 
16 16 | | 2 2 


And, since it can be proved that the relationship between 
two dimensional convolution and one dimensional convolution 


is given by [18]: 
Vili) = yo) 


where 


i = 0,1l, ... y L-1 


j = Diales eooey pie 
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In this example: 


~~ 


y(itL,j) y(jL+i) 

G) i=0,4=0 y(0+2,0) = y(2,0) y(0-2+0) = y(0) 
Ra 

@) i=l, j= 0  y(1+2,0) = y(3,0) y (0:2+1) = y(1) 
i.e. y(3,0) = y(1) = 2 

G) i=0, 551 y(o+2,1) = y(2,1) ` y(1-2+0) = y(2) 
i.e. y(2,1) = ¥(2) = 14 

G@ i=1,3=1  y(1+2,1) = y(3,1) y (1-2+1) = y(3) 
i.e. y(3,1) =y(3) = 2 

Note that for the original sequences 

x(n) = (0,1,15,2) and hin) = (1,2,0,0) 
y(n) = x(n) * h(n) = 2,2,14,2. 


Exactly the same result!! 


In taking two dimensional transforms, p and 2L are 
restricted to be a power of 2 and also p < 4b and 2L < 4b 
(b = DE bit representation of integers in arithmetic modulo 


E, Fermat number) in the example: 
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Since F. =2 +1= 17 b=4 t = 2, 


also p=2, L= 2. 


Notice that N = PL < 8b“. 


Thus the length of the sequences that can be convolved 
using two dimensional convolution scheme is proportional 
to the square of the number of bits used in the word length. 
The order in which two dimensional transforms or inverse 
transforms is taken is reversible. But there is some compu- 
tational advantage in taking transform first along the 
direction 2 (length p) and then along the direction 1 (length 


me 


2L), because half the x sequences along direction 2 (p) are 


me 


zero and half the n sequences along direction 2 are cyclic 

shifts by one Bein of the other half n sequences [18]. 
Also while taking the inverse transform, computationally 

it is advantageous to first take the inverse transform 

along direction 1, then along direction 2 [18]. Because 

we need only half the y sequences along direction 2, 


therefore after taking inverse transforms along direction 


1, we can throw away half the terms. 
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APPENDIX B 


BASIC PROPERTIES OF QUADRATIC RESIDUES 


Let 


x = a (mod p) (B.1) 


be a congruence, where p is any odd prime and a is any 


HI 


integer. If a 0 (mod p), then the only solution to (B.1) 
is x = 0 (mod p). Therefore one assumes p a. For some 
values of a, (B.1) will have a solution, whereas for some 


other values of a, (B.1) will have no solution. 


Definition B.1: Let p be a prime, and let a be any 


integer such that p / a. One says that a is a quadratic 


residue modulo p provided that 


x? = a (mod p) | (B.2) 


has a solution. Otherwise, one says that a is a quadratic 
nonresidue modulo p. 

Suppose that p is given and consider the problem of 
determining all quadratic residues modulo p. If aisa 
quadratic residue modulo p, then p f a anda = x“ (mod p) 
for some x. However, since any integer is congruent to one 
of 0,1, ..., p-l(mod p), one sees that a must be congruent 


to one of 
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rn) a (B.3) 


If p is not too large, then this procedure can actually be 
used for computation. 
Example: Let p= 13 


Then a is a quadratic residue modulo 13 if and only if a 


is congruent to one of 


ley 20 Base 125% (moa BS) + 

that is a is a quadratic residue modulo 13 if and only if 
a = 1,4,9,3,12,10,10,12,3,9,4,1 (mod 13). 
Thus the quadratic residues of 13 are 


1, 3, 4, 9, 10, 12. 


Hence the quadratic nonresidues modulo 13 are 


Notice that the initial list of quadratic residues 
obtained is symmetric, with each element of the list 
appearing exactly twice. This is a general phenomenon. 


Indeed, one has 


p-x = -x (mod p) (B.4) 
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so that 


ES mod p (B.5) 


2 
(p=) 
and thus 


“7 (mod p) (B.6) 


lil 


(p-x) i 


therefore, if a is a quadratic residue modulo p, then a 


is congruent to one of 
p=1 
l , 2 , eee # (=> mod p (B.7) 


No two integers of (B.7) are congruent modulo p. Hence, 


among the integers 
alee 2; SN pol 


precisely (p-1)/2 are quadratic residues modulo p and 
precisely (p-1)/2 are quadratic nonresidues modulo p. This 
can be verified in the above example. One finds the 


following notation very convenient: 


Definition B.2: 
Let p be an odd prime and a an integer such that p / a. 


The Legendre symbol is defined as follows: 
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+1 if a is a quadratic residue modulo p 
a — 
E = (B.8) 


-1 if a is a quadratic nonresidue modulo p 


The Legendre symbol (5) should not be confused with the 


fraction a/p. 


Example: Let p=3 
a el eae 4 2 
5 2 
(7) = 1 


See above example, where it is listed the quadratic residues 
modulo 13, and quadratic nonresidues mod 13. Moreover, 


since 
18 = 5 (mod 13) 


and 5 is a quadratic nonresidue modulo 13. So is 18, and 


thus 


Properties of Legendre symbol. 


Let p be an odd prime and let a and b be integers such 
that p / a and p / b. Then the following results hold 


[20]: 
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Proof: 


(1) 


(ii) 


(iii) 


(i) 


(ii) 


(iii) 


) = 1 (B.9) 


H 
Fh 
W 
li 


z b 
b mod th a = = 
p en (5) (5) 


The congruence x° = ae mod p has aS a 


solution x = a. 


Set a = 1 in result (i). 


If a = b(mod p), then the solutions x? = a(mod p) 


are the same as the solutions of x? = Db (mod p); 
Therefore, the first congruence has solutions 


if and only if the second does. Thus, 


b 


QS (5) 


a 
p 


The properties of the Legendre symbol given above are very 


elementary. 


However a property of the symbol which is by 


no means obvious is the following result: 


Theorem 8.3: 


Let p be 


pa. 


Then 


(Euler's Criterion): 


an odd prime and let a be an integer such that 
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(2) ars a) (B.10) 


Proof: 
By Fermat's theorem (2.14), one has 


p-1 


(q (P71) 72) 2 a = 1 (mod p) 


Thus, if 

E 
then 

h? = 1 mod p 
and so 

pl (h-1) (h+2). 
Therefore 

P|h-1, 
or 

padi; 
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and hence 


h= aP711/2 2 ¿1 (moa p). 


Now if p is odd, so 5 = +1 if and only if a ae = 1 (mod p). 


Consequently, ©) = +1 if and only if 


a (9711/27 41 (mod p) 


respectively. 


Corollary B.4: 


Let p be an odd prime, and let a and b be integers such 


that p / a, p/ b. Then 


ab _ a b l 
(I = (5) (5) (B.11) 
Proof: 
By Euler's Criterion 
C a BLZ E ES 2 mod D. 
p P P 
It is an immediate consequence of Corollary (B.4) 
that 
(i) the product of two quadratic residue modulo p is 


a quadratic residue modulo p. 
(ii) the product of two quadratic nonresidue modulo p 


is a quadratic residue modulo p, 
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and 
(111) the product of a quadratic residue and a 


quadratic nonresidue is a quadratic nonresidue. 
Example: Let p= 13 


By previous tons 
3 and 12 are quadratic residues mod 13 
and 
3°12 = 36 = 6 (mod 13) is a quadratic residue. 
However 
2 and 5 are quadratic nonresidues mod 13 
and 
2-5 = 10 = 6 (mod 13) iS a quadratic residue mod 13. 
Finally 


7 is a quadratic nonresidue mod 13 


10 is a quadratic residue mod 13 
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and 


7°10 = 70 5 (mod 13) is a nonresidue. 


Another consequence of Euler's criterion is the following: 


Corollary 8.5: 
let p be an odd prime. Then 


C 


In other words, 


+1 iE p = 1 (mod 4) 
<] e 
oe 
-1 LE p = 3 (mod 4) 
Proof: 
By Euler's criterion, 
(Sy = (2) P17? (mod p) 
Therefore, since (>) = +1] and since p > 2, we have the 
desired result. 
Notice that 
AE 2) = (=) = (1)? 
p p p p 
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since i) = l by (B.9), therefore, a = -af (mod p) is 


solvable if and only if p = 1 (mod 4). 


Example: 
Let 
DR 
x =19 (mod 23) 
Now 
19 = -4 (mod 23) 
So 


_ = _ -1 ae F 
(53) = (53) = (53) (53) = 1 


Since 23 = 3 (mod 4). Thus x” = 19 (mod 23) is not solvable. 
How does one go about computing 5 for p/a? 
Suppose that 


a a 


u 1 t 
a = Py eee Pr 


where P,, ---, P, are distinct primes. Since p Y a, one 


sees that p # P;- Then by Corollary 4, one has 
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Example: 


Let p= 5 
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